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Preface 



The ongoing migration of computing and information access from the desktop and tele- 
phone to mobile computing devices such as PDAs, tablet PCs, and next-generation (3G) 
phones poses critical challenges for research on information access. 

Desktop computer users are now used to accessing vast quantities of complex data 
either directly on their PC or via the Internet - with many services now blurring that 
distinction. The current state-of-practice of mobile computing devices, be they mobile 
phones, hand-held computers, or personal digital assistants (PDAs), is very variable. 
Most mobile phones have no or very limited information storage and very poor Internet 
access. Furthermore, very few end-users make any, never mind extensive, use of the 
services that are provided. Hand-held computers, on the other hand, tend to have no 
wireless network capabilities and tend to be used very much as electronic diaries, with 
users tending not to go beyond basic diary applications. 

This “state-of-practice” presents a dramatic contrast to the technological vision, 
and the emerging “state-of-the-art” devices, which are small, very powerful, wireless 
networked computing platforms. Providing access to large quantities of complex data 
on such devices while users are on the move and/or engaged in other activities poses 
significant challenges to the information access community and brings together many 
classical computing domains, such as information retrieval (IR), human-computer in- 
teraction (HCI), information visualization, and networking. This volume contains 21 
papers that approach these challenges from different directions. The bulk of the papers 
come from the Workshop on Mobile and Ubiquitous Information Access that was held 
as part of Mobile HCI 2003 in September 2003. 1 Other papers were specially invited, 
to complement the presented papers and extend the volume. 



Overview 

The 21 papers in this volume have been grouped into the following four parts. Many 
of the papers fall into more than one category, and sometimes our choice has been 
somewhat arbitrary, but hopefully still useful. 



Foundations: Concepts, Models, and Paradigms 

The field is young, so it is not a surprise that some work is being done on basic concepts 
and visions of the future. In The Concept of Relevance in Mobile and Ubiquitous Infor- 
mation Access, Coppola et al. discuss the concept of relevance in the mobile, wireless, 
and ubiquitous information retrieval arena. In Conversational Design as a Paradigm for 
User Interaction on Mobile Devices, Leong borrows from well-established linguistics 
research and he presents a design paradigm for user interfaces on mobile devices based 

1 Mobile HCI 2003 was part of the Mobile HCI series (see www.mobilehci .org); its pro- 
ceedings were published in LNCS volume number 2795. 
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on Grice’s conversational implicatures. One-Handed Use as a Design Driver. Enabling 
Efficient Multi-channel Delivery of Mobile Applications, by Nikkanen, presents several 
practical and useful guidelines for mobile devices and applications, based on both a 
literature review and lessons learned at Nokia. In the last paper in this part. Enabling 
Communities in Physical and Logical Context Areas as Added Value of Mobile and 
Ubiquitous Applications, Pichler discusses how to provide added value to mobile users, 
maintaining the importance of designing services that are very specific to the context 
area, and how to foster communities based on both physical and logical contexts. 

Interactions 

Of course, interaction problems are paramount. One of the key issues when working 
with mobile devices is how to input data to a mobile device with very poor input de- 
vices. The other, symmetrical, key issue is how to fully exploit the small available dis- 
play area. The second paper of this part discusses the former; the other ones the latter. In 
Accessing Web Educational Resources from Mobile Wireless Devices: The Knowledge 
Sea Approach, Brusilovsky et al. evaluate the use of Self-Organizing Maps (SOMs) 
for information access to educational resources. In Spoken Versus Written Queries for 
Mobile Information Access, Du et al. analyze IR effectiveness when the query is in- 
put via speech: they present a prototype and its experimental evaluation. In Focussed 
Palmtop Information Access Combining Starfield Displays with Profile-Based Recom- 
mendations, Dunlop et al. present two applications using starfield displays on a PDA 
and exploiting advanced collaborative filtering techniques: Taeneb CityGuide recom- 
mends restaurants and Taeneb ConferenceGuide presents the timetable of a conference. 

Applications and Experimental Evaluations 

Several approaches are used for implementing applications. Following a strong tradition 
in both the HCI and IR communities, evaluation is deemed a crucial issue and several 
papers focus on experimental studies of mobile applications. In Designing Models and 
Sendees for Learning Management Systems in Mobile Settings Andronico et al. pro- 
pose a survey of previous systems for mobile learning, and describe an ongoing project. 
Cignini et al., in E-Mail on the Move: Categorization. Filtering, and Alerting on Mobile 
Devices with the ifMail Prototype, present a prototype allowing e-mail categorization, 
filtering, and alerting on mobile devices, and its first experimental validation. In Mobile 
Access to the Ffschldr-News Archive, Gurrin et al. illustrate the Ffschlar-News sys- 
tem, processing digital video and audio news stories, which is capable of segmentation, 
collaborative filtering-based recommendation, and delivery on mobile devices. Mai et 
al., in A PDA-Based System for Recognizing Buildings from User-Supplied Images , de- 
scribe a prototype providing navigational and informational services to an urban mobile 
user based on GPS and building recognition achieved through image processing tech- 
niques. In SmartView and SearchMobil: Providing Overview and Detail in Handheld 
Browsing, Milic-Frayling et al. overview their SmartView technology, which makes 
Web pages with complex layout more accessible to mobile devices, and show and eval- 
uate its integration into SearchMobil, to help the users of a small screen display esti- 
mate the relevance of retrieved Web pages. The paper titled Compact Summarization 
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for Mobile Phones, by Seki et al., deals with the very important (for mobile devices) 
issue of summarization: these authors present a new summarization method based on 
the genre of a document and they evaluate it. On the same topic, Sweeney et al. in Sup- 
porting Searching on Small Screen Devices Using Summarisation discuss and evaluate 
by means of a user test how summarization can improve IR on small screen devices. 
In Towards the Wireless Ward'. Evaluating a Trial of Networked PDAs in the National 
Health Service, Turner et al. discuss and evaluate, by means of an on-field user study, 
several important issues on the usage of PDAs in the medical field. Finally, in Aspect- 
Based Adaptation for Ubiquitous Software, Zambrano et al. delve into software engi- 
neering issues: they propose Aspect Oriented Programming (AOP) as a solution to deal 
smoothly with issues that are peculiar to the design of mobile device applications and 
that are not found when designing standard desktop applications. 

Context and Location 

A hot issue in mobile device research is, of course, how to take into account and exploit 
the context in which the user is. In Context-Aware Retrieval for Ubiquitous Computing 
Environments, Jones et al. perform a thorough analysis of context-aware retrieval: they 
present definitions, links with other disciplines (IR, information filtering, agents, HCI), 
and a description of their own findings. Nussbaum et al., in Ubiquitous Awareness in 
an Academic Environment, propose and evaluate a prototype that, on a campus, en- 
hances student relationships by fostering face-to-face meetings. In Accessing Location 
Data in Mobile Environments: the Nimbus Location Model, Roth proposes the Nimbus 
framework, a formal model for location information, integrating physical and semantic 
information. The paper A Localization Service for Mobile Users in Peer-to-Peer En- 
vironments, by Thilliez et al., describes a localization service based on a peer-to-peer 
(P2P) architecture, featuring location-based queries. Finally, in the last paper of this 
volume, Sensing and Filtering Surrounding Data : the PERSEND Approach, Touzet et 
al. present an application dealing with the issues of distributed databases, proximate 
environments, and continuous queries. 
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Abstract. We discuss how the wireless-mobile revolution will change the no- 
tion of relevance in information retrieval. We distinguish between classical 
relevance (e-relevance) and relevance for wireless/mobile information retrieval 
(w-relevance). Starting from a four-dimensional model of e-relevance previ- 
ously developed by one of us, we discuss how, in an ubiquitous computing en- 
vironment, much more information will be available, and how it is therefore 
likely that w-relevance will be more important than e-relevance to survive in- 
formation overload. The similarities and differences between e-relevance and 
w-relevance are described, and we show that there are more differences than 
one might think at first. We specifically analyze the role that beyond-topical cri- 
teria have in the w-relevance case, and we show some examples to clarify and 
support our position. 



1 Introduction 

It may surprise you, but we can hardly imagine what information overload is. Just 
stop one minute and think how our world will probably be in ten years or so. As soon 
as mobile wireless devices will ubiquitously enter our lives, the nowadays complaints 
about having access to too much information will be seen with a small ironic grin and 
perhaps some nostalgia. We are not speaking only of palm top devices, cellular 
phones, laptop computers, pagers, MP3 players, and similar already commonly used 
device; we are thinking also of networked digital cameras and video-cameras, ther- 
mometers, traffic lights, GPSs for cars and - why not - bikes, skates, pogosticks, and 
even walking people, game stations, and so on. Thousands of interconnected informa- 
tion processing devices will be available to each of us anytime anywhere. Each mo- 
bile device will sense its environment to gather information from the physical world 
and make it available to its user (or users). Each device will also exchange informa- 
tion with other (mobile and non-mobile) devices, mainly by means of some wireless 
communication network. Probably, users will (continue to) directly exchange infor- 
mation among them. Also, devices will probably change the physical environment, to 
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a greater extent than nowadays static and non-ubiquitous desktop machines. A similar 
view is expressed, for instance, in [12]. 

All the mobile devices can be seen, from the user point of view, as information ac- 
cess tools: they will filter incoming information and retrieve available information, 
trying to present to the user all and only the relevant information. Of course, the user 
will be interested in accessing information that is not only relevant in the strict sense, 
but also of a high quality, timely, serendipitous, of the appropriate grain size, perhaps 
rare, and so on. Since there is not an agreement about which of these features are 
relevance features, we will use the term “relevance” in a very general way, denoting 
with relevant information the information that the user wants. 

But what is relevance in the new mobile/wireless/ubiquitous scenario? This paper 
is a first and preliminary attempt of answering this question. We hope both to help 
traditional information retrieval researchers to appreciate some complications peculiar 
to the mobile domain, and to persuade researchers working in the mobile and wireless 
field of the importance of the information access approach. Therefore, we try to stay 
at a level high enough to be understandable by an interdisciplinary audience. 

The paper is organized as follows. In Section 2 we will briefly overview the re- 
search about the concept of relevance in classical non-mobile Information Retrieval 
(henceforth IR). We name relevance in classical IR e-relevance (for electronic rele- 
vance, but this is not the only reason, as we will explain in Section 5). In Section 3 we 
will re-analyze the relevance concept in the mobile case. In turn, we name this rele- 
vance w-relevance (for wireless relevance, but, again, see Section 5). We show that, 
from an intuitive point of view: (i) w-relevance is an extension of e-relevance; (ii) w- 
relevance is much different from e-relevance than one might think at first; and (iii) 
beyond-topical criteria, one aspect of e-relevance that has recently received a lot of 
attention in non-mobile IR, are both much more emphasized and much more impor- 
tant in the mobile case. In Section 4 we propose some simple examples and scenarios 
to support our position. Section 5 concludes the paper. 



2 E-Relevance: The Non-mobile Information Retrieval Case 

Relevance (e-relevance) is a subject that has been intensely studied for years in the IR 
field, and it is still a hot topic today. We will not review in detail the field, since some 
well known surveys are already available [14, 20, 21, 22, 23]. 

Classical information retrieval equates e-relevance with topicality: the query sub- 
mitted to an IR system specifies the topic(s) that a relevant document has to deal with. 
For example, if a university professor is looking for documents to prepare her next 
lesson for this afternoon, she needs of course documents that deal with the matter that 
she is going to explain to her students. But she also wants those documents as soon as 
possible (if a document arrives after the lesson, it is useless), at the right complexity 
level (if a document is too difficult, students will not understand it), and so on. And 
these features go beyond the topic: they are completely independent of it. 

Therefore, the topical view is short-sighted. Indeed, we have now a large amount 
of research that demonstrates how topic is only one of the criteria that users use when 
judging the e-relevance of the retrieved documents. For a review of this line of re- 
search, that started in the 60es and has received a lot of attention (especially at Syra- 
cuse University) in the 80es and 90es see [1, 14]. Since the criteria, elicited from 
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users or found by experts, tend to constitute a stable set (i.e., very few new criteria are 
found in the most recent studies), it is likely that we have an almost correct and com- 
plete list of relevance criteria. 

Actually, the exploitation of beyond-topical criteria is not the only way to get 
closer to the “real” relevance, i.e., the relevance the user is interested in. A more gen- 
eral approach that takes into account this aspect has been proposed by one of us some 
years ago [9, 15]: the various kinds of relevance are classified in a four-dimensional 
space, distinguishing among them on the basis of a precise classification. The four 
dimensions are: 

• Information resources, containing document, surrogate, and the information that 
the user receives when reading a document. 

• Representation of the user problem, containing the real information need, the per- 
ceived information need, the request (or expressed information need), and the 
query (or formalized information need). 

• Time, containing the time instants from the arising of the user’s need to its satisfac- 
tion. 

• Components, containing topic, task (what the user has to do with the retrieved 
information), and context (everything beyond topic and task as, for example, what 
the user already knows about the topic being sought, or the time that the user has to 
complete the search). 

These four dimensions allow one to distinguish among the various kinds of rele- 
vance, and to speak, for instance, of: the relevance of a document to the query at 
query expression time for what concerns the topic component (the classical relevance 
used in IR); the relevance of the information received to the real information need at 
the time of final need satisfaction for what concerns topic, task, and context (the rele- 
vance the user is interested in); and so on. This classification can be used in the im- 
plementation and evaluation of IR systems. 

This topic/task/context distinction has been used in some respect. Reid [18] pro- 
posed an evaluation methodology that uses the task as the starting point for building a 
test collection. The development of IR systems dealing with beyond-topical e- 
relevance has been rather slow, however some examples now exist. Researchers at 
MIT recently developed an IR system that, in some way, goes beyond topical criteria 
[13]. This system, named GOOSE (GOal Oriented Search Engine), allows the user to 
choose among a list of tasks (called “goals” by GOOSE authors), and uses a large 
common sense knowledge base to exploit the task specification for building a better 
query. In such a way, Liu and colleagues implemented, perhaps without explicitly 
noting it, an IR system that tries to work taking into account beyond-topical factors of 
relevance, as suggested in [15], 

One can also assume that, although each search and each information need concern 
a different topic, there are indeed some beyond-topical components of user’s needs 
that are more stable, i.e., the context in which the consecutive search sessions by one 
user take place [10, 11], Some first experiments show that, for a given user, contexts 
are indeed more stable than topics, and may be used to improve the ranking of docu- 
ments retrieved after a query, but the usefulness of this approach is still under investi- 
gation. 

Another approach for including beyond topical criteria in an IR system is to build 
an IR assistant, namely a system that, during information seeking, observes user be- 
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havior and gives suggestions aimed at improving the effectiveness of the search and 
of the searcher [3, 4, 16]. Some of the suggestions might be of a topical nature (e.g., 
to add some terms to the query to better represent the topic being sought for), but also 
non-topical suggestions can be provided, like suggesting a paper related in some way 
to those judged as relevant so far (e.g., the PhD thesis by, or a short biography of, the 
author of a paper judged as relevant, or a references list, and so on). This line of re- 
search has still to be proven effective, but initial laboratory experiments show positive 
results. Also “just-in-time information retrieval agents” [19] build their queries with 
beyond-topical components (mainly context). 

Even if the existence of beyond-topical criteria for e-relevance is not in discussion, 
what seems not yet recognized, or assessed, is the actual importance of these criteria 
in real-life IR. In the next section we discuss, on the basis of the classification in [15], 
how and why the w-relevance scenario is different. 



3 W-Relevance: The Mobile Information Retrieval Case 

One might simply repeat the above analysis in the w-relevance case, and thus just 
state that there are various kinds of w-relevance and there are some beyond-topical 
components of w-relevance that should not be overlooked. However, we believe that 
there are important differences between e-relevance and w-relevance. The beyond- 
topical criteria in the mobile IR case become more critical: they are different from, 
and have a higher importance than, those in non-mobile IR. Therefore, topicality is an 
abstraction that works in a perhaps satisfying way (even far from perfect) in the e- 
relevance case but, as soon as the real world comes into playthe shortcomings of this 
approach are manifest (examples will be shown in Section 4). Also, there are more 
kinds of w-relevance than kinds of e-relevance. As we will discuss in the following, 
the main reason for these differences is that in the e-relevance case we can comforta- 
bly seat inside the “information world”, whereas in the w-relevance case we have to 
move into the “real/physical world”. 

All the e-relevance models proposed in past years need to be modified to become 
adequate models of w-relevance. In this section we revise and extend the model pro- 
posed in [15], in each of the four above mentioned dimensions. 



3.1 Information Resources 

In the non-mobile case, the user of an IR system is usually interested in retrieving 
information; a typical user is a scholar that needs documents on a new topic, to study 
them, to write a paper or book, and so on. This is obtained by retrieving a number of 
information sources (books, articles, Web pages, etc.), from which the user can ex- 
tract the relevant information. In the mobile IR scenario, it is often the case that the 
user is interested not just in information, but in obtaining some (possibly material) 
thing, only partially described by information (e.g., a physical place or a pair of blue 
jeans): besides surrogate, document, and information, the information resources di- 
mension should therefore include also the things, and should perhaps be renamed as 
resources. In other terms, often the retrieved information and, in general, the database 
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are instrumental, since they are means to reach the end of possessing, or obtaining, 
some thing, not the end itself. Besides the relevance of retrieved information, we also 
have the relevance of the retrieved thing: the user will not evaluate the information 
sources, but the described physical object, that in the meantime might change or dis- 
appear even without an immediate reflection on the information source content. This 
brings up the issue of consistency between the database and the real world. 



3.2 Representation of the User Problem 

Since in the mobile IR scenario, besides the real information need (that is in turn 
beyond the information need perceived by the user), it is often the case that the user is 
interested in some thing, we can say that the user usually has a thing need (that should 
then be added to the second dimension, namely the representations of the user prob- 
lem). Therefore, if we look at the first two dimensions, we can say that from the rele- 
vance of the retrieved information to the information need, we have moved to the 
relevance of the retrieved thing to the thing need. Using Bateson's [2] terminology, 
w-relevance deals more with Pleroma (the physical world), whereas e-relevance deals 
mainly with Creatura (the informational world): in w-relevance we have a much 
stronger coupling with the real, physical world. If one photocopies an article in a 
library (e-relevance scenario), you can anyway read the article later. If someone buys 
the last item of your favorites blue jeans just after your query to a “blue jeans data- 
base”, you cannot have them anymore (w-relevance scenario). In the former case you 
are interested in information, whereas in the latter you are interested in a thing. 



3.3 Time 

Another dimension of relevance that increases its importance in the w-relevance case 
is time, in two senses. First, often the user needs “quick and dirty" information: things 
change faster, replication is more difficult. Second, in the real world, since time is 
irreversible, if something is lost it is lost. In the Creatura one can often rely on back- 
ups, copies, and replication; in the real world, “carpe diem”. This is perhaps the deep 
motivation behind the often stated claim that users of mobile devices are more inter- 
ested in precision than in recall, usually justified, in a perhaps too simplistic way, by 
the small display area on mobile devices: having a full list of the relevant items can be 
useless if the list is so long that the time required for examining it is longer than the 
lifetime of relevant items. 

Another aspect of w-relevance, related to both the strong coupling with the real- 
world and time, is the database change rate: since the real world changes quickly and 
continuously, the database has to quickly change accordingly to stay up-to-date. 

The intuitive importance of time is also confirmed by a survey made last year in 
Singapore among users of PIRO, a commercial system developed by C5solutions [5]. 
PIRO presents to mobile users using WAP phones the directory listings of commer- 
cial retail relevant to user’s current need. Eight users filled in a 29 questions question- 
naire having the purpose to rate the importance of various relevance criteria for pre- 
senting commercial applications on a mobile device. Of course the small sample size 
does not allow any certain inference, but it is worth noting that two out of the three 
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highest rated criteria are “information is current (up to date)” and “information is 
about a sale or promotion or money saving opportunity”, both of which concern time 
features. 



3.4 Components 

“Context” is a hot word in the mobile/wireless scene, but with a different meaning 
from that used above [6, 7, 8, 17]: context usually refers to the current environment, 
the situation, that the user of a mobile device is experiencing while using it. Let us see 
some examples of this usage of the term. 

Location is one of the most mentioned aspects of context [26]: from the user posi- 
tion (derived by means of GPS, or triangulation in a Bluetooth or Wi-Fi network) 
other information can be inferred and exploited in various ways, for example to in- 
crease the relevance of the information accessed by the user, or to improve the inter- 
action with the user. 

Of course, location is important, but there is more to context than location [24]. 
Location itself is not the only information we can get from the spatial position of the 
user. Indeed, this feature is only as a “static” one, whereas several additional informa- 
tion can be inferred from the dynamic evolution of locations. For example, if the user 
looks for traffic information, and she is moving along a road, it is very likely that she 
wants information about the road she is currently on, rather than the whole national 
traffic news. Therefore, user’s track, i.e., the temporal sequence of locations traveled 
by a user, is another aspect of context. Let us notice that context can also be pre- 
dicted; for example, a full track can be inferred if the user’s scheduler reveals that she 
has an appointment in half an hour at a certain place. Also the traveling speed that the 
user has while following a certain track is an important parameter: a slowly walking 
user can be presented more information than a running one [25]. 

Other common examples of context aspects are: the noise level in the environment 
(that can and should affect the volume of a mobile phone); the light level in the envi- 
ronment (affecting the display illumination); the orientation of the device (affecting 
the orientation of the displayed information) [24] . 

Therefore, “context” has a different meaning in w-relevance: in e-relevance, con- 
text concerned what was in user’s mind only (perhaps mainly); in w-relevance, con- 
text is also (perhaps mainly) about the real world. We will distinguish between these 
two meanings by using the terms e-context and w-context. W-context is more general 
than e-context, is much more dynamic, and it is more likely to change during the 
information seeking activity. It is also worth noting that the above mentioned Goker’s 
preliminary results on context stability [10, 11] might not hold in the mobile envi- 
ronment. On the other hand, whereas e-context has to be provided manually to an IR 
system, w-context is likely to be autonomously derived in an easier way; the reason is 
that many of the components that belong to w-context and do not belong to e-context 
can, at least theoretically, be inferred automatically (e.g., location, track, noise level, 
presence of other persons, and so on). Yet, the use of automatically derived w-context 
might result in a more inaccurate classification of items, due to the assumptions intro- 
duced in the model exploited for building the w-context. This might lead to avoid 
filtering out the items estimated by the system as not w-relevant, and to prefer a more 
careful information visualization approach, e.g., to rank the retrieved items. 
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4 W-Rele vance Scenarios 

In this section we show some realistic scenarios that support the above discussion. We 
are aware of the importance of privacy issues, but we do not take them into account in 
this paper. 

4.1 Catching the Train 

Let us consider a query for finding a train from Udine, Italy to Milano, Italy, made to 
the Italian national railways web site (http://www.trenitalia.com). At present, you fill 
a form with (at least) departure and arrival cities, and starting date and time. If the two 
cities have more than one railway station, you are also asked to select the specific 
stations you want. Then, you receive a list of trains since that hour, and if you want 
details on one of them, you have to click on its link, then go to the price link and fur- 
ther compile a form where you specify how many tickets, in which class, etc. 

What if you are reaching the station in a hurry on a taxi, just in time for a train? 
You do not need to be informed about all the trains from Udine to Milano in the next 
two hours: you just need quickly to go where your are used to go, i.e, Milano Piazza 
Garibaldi railway station, in second class as usual, by the first train available. Other 
options include: you are not alone, but with your husband/wife (your PDA is sensing 
him or her around you); the ticket can be bought automatically (using the available 
details of your credit card); and the first train retrieved could be not useful because 
there are not two free seats in second class. All these data can be derived from your 
own w-context and an up-to-date (with respect to the real world) database, 

4.2 Driving to a Conference 

Let us imagine that you are driving your car, rented at Venice airport, towards Udine 
to attend Mobile HCI 2013 conference. Your car is of course equipped with a GPS 
and a driving assistant, giving you directions about the route to follow. In this situa- 
tion, the information that the car 100 meters in front of you is going to Udine too is 
very relevant, and should be immediately notified to you so that you might follow that 
car without worrying about road directions. Moreover, if the driver in the other car is 
a good friend of yours, you should be notified about that too, since you might want to 
contact her (with an SMS?) for sharing the trip or just having a coffee together. 

In this scenario, the topic is straightforward, being the destination of your trip 
(Udine); the task is given mainly by driving in a convenient way, with perhaps some 
subtasks given by sharing the trip for economy, avoiding pollution, just chatting. The 
latter case is even more interesting if the driver is a friend, but this is not expressed 
neither in topic nor in task, but most likely in the e-context - your address book, your 
last phone calls, etc. You might go on, and think of the situation if your good friend is 
not a good driver at all, or if her car is a very old one (and these are w-context as- 
pects). 

4.3 E-Commerce Application 

Now you are a trendy boy/girl, shopping around in a commercial center, and willing 
to buy some fashionable trousers (“Gasoline” brand), and a newly available “Mos- 
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quito” shirt. You do not want to spend too much money, so you ask your PDA to look 
for those dresses at a good price. Some different outcomes might be considered: (i) 
your are using a traditional e-relevance based 1R system, thus you will receive a list of 
offers on eBay, followed by some online shop catalogue showing very good prices for 
the same dresses; unfortunately, you are around for shopping, you want to have your 
dresses now and not to wait for their postal delivery; (ii) your PDA queries a w- 
relevance based IR system which, on the basis of your location, track, and walking 
speed, is able to infer that you probably want some specific place where to buy such 
dresses, possibly close to you. 

So you will receive three shop addresses: the closest one is not the first because it 
is slightly more expensive than another one, which in turn is sufficiently near to be 
reached before closing time. A third shop is listed with good prices but with just one 
shirt of your size (as recorded in your personal profile, or communicated after request 
by the radio/infrared label applied on the shirt you are actually wearing); you have to 
run before someone else buys it, or perhaps you can book it by means of an electronic 
message. Again, the topic is Gasoline trousers, Mosquito shirt, good price, but the real 
task is to actually buy them, not to know where to buy them. Real things get sold out, 
usually on a first-came first-served basis: what is true now (e.g., availability of my 
size) could be false in some minutes, thus time matters too. 

4.4 A Museum Application 

Just in front of a beautiful Van Gogh picture, you want to have some more informa- 
tion to understand why he is painted with a bandaged ear. The wireless service avail- 
able there of course does not need to show you the picture itself: you just need textual 
background information, as in you current w-context there is the real availability of 
the object which generates the topic for the query. In the wireless-enabled environ- 
ment, your PDA should handshake with a radio/infrared label applied near the picture, 
so that, in addition to exactly know your position, it is also able to automatically gen- 
erate part of the query. 



5 Conclusions and Future Work 

In this paper we have shown how the mobile/wireless/ubiquitous revolution is likely 
to bring big changes into the IR field, and how even a very foundational concept as 
relevance needs to be re-analyzed and re-defined. 

The relevance in classical electronic environments (non-mobile ones) that we have 
named e-relevance, is actually an irrelevance, because many features of it are ne- 
glected, or at least not given the importance that they should have in the general case. 
In the mobile IR case, this generality is more easy to notice: w-relevance does not 
only mean wireless relevance, but also double-relevance, world-relevance (since the 
physical world is much more involved) and double-user-relevance, since it is a notion 
of relevance that is much more close to what the users want and need. 

The current model for information retrieval - with one user and one system - is 
also challenged in a peer-to-peer wirelessly connected environment, where ultra- 
mobile devices are available for providing information to each other. Information will 
be available from many devices through many channels, either phone-like (WAP, 




The Concept of Relevance in Mobile and Ubiquitous Information Access 9 



GPRS, UMTS) or local networking (Wireless LAN, Bluetooth, IrDA). Such devices 
may in turn provide also w-context information, i.e., location (in a broad sense, not 
only geographical coordinates), track, temperature, etc. In such a complex peer-to- 
peer scenario, it is likely that a single query made by my device could be answered by 
more than one system, and that each system could be engaged in a sort of “reverse 
relevance”, asking to itself something like “Am I able to answer to such a query?”, 
which in turn could be translated as “Is such a query relevant to my database?”. As 
device answers may have a cost for the user, it is also likely that the query should 
involve budgetary considerations. Useful hints about how to deal with such a kind of 
interactions may come from the multi-agent paradigm research area [27] . 

Moreover, another problem needs to be mentioned. On the one side, it seems rea- 
sonable that as soon as some information is available and potentially relevant in the 
future, it should be stored locally on one’s own device to be accessible later. This is 
even more reasonable if one takes into account that wireless devices are not always 
connected to the network, and that they use different network connections, with dif- 
ferent transfer rates, reliability, privacy, and cost. However, on the other side, small 
devices are more resource constrained: low computational and storage power, low 
energy availability, low bandwidth also in the interaction with the user. All these 
features would suggest that the local storage of information is not always the best 
choice. Also, the locally stored data can quickly become outdated because of the 
quick change rate of the database, therefore rising inconsistency problems. These 
issues need further investigation. 

In the future we plan to work on the relevance four dimensional model in order to 
make it more accurate and formalized. We also intend to use the revised model in the 
implementation and evaluation of mobile IR systems. 
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Abstract. This paper borrows the Cooperative Principle and the idea of conver- 
sational implicatures from H.P. Grice’s Logic and Conversation. We apply it to 
user interface design, specifically for mobile devices. We introduce the idea of 
conversational design, which is that user interfaces should be designed to flow 
as if they were a conversation between two cooperative entities. We present a 
case study on mobile phone user interface and show where conversational de- 
sign provides a new perspective on interface design and how such an analysis 
helps create better interfaces for mobile devices. 



1 Introduction 

This paper introduces the idea of conversational design inspired by readings in the 
Philosophy of Language, specifically by the work of H.P. Grice 1 . By definition, hu- 
man conversation is a cooperative endeavor with both parties working to exchange 
information. Otherwise, it would merely be two simultaneous monologues. 

Grice proposes what he called Conversational Maxims, which are non-prescriptive 
rules of what each party 2 in a conversation expects from the other. What is remarkable 
about the maxims is the way they generalize out of the linguistic domain into any 
form of communication between two (or more) parties where there it is possible to 
distinguish a series of interactions between them. We propose to use these maxims as 
design principles in computer human interaction, and in formulating test suites for 
judging the efficiency and completeness of the interaction. 

What is also important is that, linguistically, human beings are not slaves to the 
maxims. They are often broken (maliciously or inadvertently) and this results in either 
a partial or complete breakdown in the communication flow. The conversation, how- 
ever, does not end in the case of a breakdown; rather there is a (series of) correction(s) 
which automatically puts the conversation back on track. One may suggest that the 
maxims codify certain assumptions on the part of the speaker and hearer which allows 
conversation to occur fast (the assumptions reduce the search space of the context), 
fluidly (shared assumptions), but not always accurately (misunderstandings). 



1 H. Paul Grice was an influential philosopher of language in the middle part of the 20 lh cen- 
tury. He is best known for his work on conversational implicatures, first presented at the 
1967 William James Lectures. 

2 For simplicity, we assume our conversations occur between two parties and not more. 

F. Crestani et al. (Eds.): Mobile and Ubiquitous Info. Access Ws 2003, LNCS 2954, pp. 1 1-27, 2004. 
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This has interesting ramifications for user interface design as well. If we, as hu- 
mans engaged in conversation, sometimes breakdown and automatically repair, then 
can we not use that as a model for user interface design? That is to say, we design a 
UI that works fast and fluidly but which may occasionally makes mistakes (but which 
can be repaired) rather than trying to achieve error-free communication. 

The following section introduces Grice’s Conversational Maxims and provides ex- 
amples of how a language motivated idea translates into other (non-linguistic) modes 
of communication. In particular, we will argue that mobile device interaction takes 
the form of a conversation between the user and the device. Section 3 defines conver- 
sational design including the idea of breakdown and repair and provides a case study 
analyzing the user interface of some prototypical mobile devices. Section 4 validates 
the intuitions of the previous section with some preliminary experiments specifically 
on mobile phones. Section 5 ties the idea of conversational design to the theme of this 
collection of papers, i.e., to mobile information access, and also considers some im- 
plementation issues. We conclude with a summary and some discussion of future 
work. 



2 Background and Motivation 

Cross-disciplinary approaches to user interface design are common, but they mostly 
borrow from psychology and cognitive science, e.g., the papers in [1] and [2], In this 
paper, however, we are borrowing from the Philosophy of Language, specifically 
from the work of H. Paul Grice in Logic and Conversation [3]. We introduce Grice’s 
work on conversational implicatures, leading to his conversational maxims. We show 
how these maxims apply in non-linguistic modes of communication, ending with an 
example on a mobile device interface. 



2.1 Conversational Implicatures and the Cooperative Principle 

In Logic and Conversation [3], H.P. Grice distinguishes between conventional and 
unconventional implicature 3 in language. Conventional implicature occurs when one 
part of a sentence (or utterance) explicitly implies another, e.g., “He’s fat therefore he 
likes to eat”. Unconventional implicature occurs when what is implied by a sentence 
is different from the lexical content of the sentence itself. A common example comes 
from John Searle’s Speech Acts [4] where “Can you pass the salt?” implies a request 
to pass the salt and is neither a question (despite the grammatical form) nor a query on 
one’s ability to move the salt shaker (which is the lexical content). Other examples 
include uttering “How do you do?” which is just saying hello or “I could eat a horse!” 
which merely expresses a high degree of hunger rather than a desire for an equine 
dinner. 

Grice, however, was interested in a subclass of non-conventional implicatures 
which occur in the context of a conversation. A conversation takes place between two 
or more parties over time. There is an exchange of utterances (interactions) with in- 
tent to accomplish a common task. This is in contrast to a command or an acknowl- 



3 Implicature is just a fancy way of making a noun out of implies. 




Conversational Design as a Paradigm for User Interaction on Mobile Devices 13 



edgement which is strictly unidirectional. So, a conversation has to be a cooperative 
effort between the parties concerned. Thus one may formulate a Cooperative Princi- 
ple t, specifically: 

Make your conversational contribution such as is re- 
quired, at the stage at which it occurs, by the ac- 
cepted purpose or direction of the talk exchange in 
which you are engaged. 

In other words, a conversation implies that the utterances of the parties involved have 
two components: (a) the lexical content which explicitly exchanges information be- 
tween the parties, and (b) the unspoken structure which explicitly works to keep the 
conversation going smoothly and efficiently. 

It is this second component which Grice calls conversational implicature. This 
may be thought of as a moving set of expectations by each party about the utterances 
of the other party. Grice’s major contribution in Logic and Conversation was to cod- 
ify the rules (or maxims as he calls them) which generate these expectations over the 
course of a conversation. These maxims are presented in the next section. 



2.2 Conversational Maxims 

There are four categories of maxims which result from the Cooperative Principle: 
Quantity, Quality, Relation and Manner 4 . These are listed below and abstracted al- 
most verbatim from [3], and should be self-explanatory. 

The Maxims of Quantity 

This category of maxims deals with the amount of information a conversational party 
would provide: 

a. Make your contribution as informative as is required (for the current purpose of 
the exchange) 

b. Do not make your contribution more informative than is required. 

The Maxims of Quality 

This category has a supermaxim: 

a. Try to make your contribution one that is true 
As well as two more specific maxims: 

a. Do not say what you believe to be false. 

b. Do not say that for which you lack adequate evidence 

The Maxims of Relation 

This category has a single maxim: 
a. Be relevant. 



4 These four categories are named for the categories of perceptual judgment in Immanuel 
Kant's Critique of Pure Reason [5] 
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The Maxims of Manner 

Unlike the previous maxims, this category relates not to what is said but how it is 
said, and also has a supermaxim: 

a. Be perspicuous, i.e., clearly expressed and easy to understand 
And various specific maxims: 

a. Avoid obscurity of expression 

b. Avoid ambiguity 

c. Be brief (avoid unnecessary prolixity) 

d. Be orderly 

2.3 Examples of Conversational Implicature 

Conversational maxims attempt to codify the implicit implicatures in normal conver- 
sation. As such, positive examples of the implicatures would be trivial and not par- 
ticularly illuminating. What Grice does, and what we will do here as well, is to exam- 
ine examples where the conversation breaks down to illustrate specific maxims being 
violated. 

We will provide examples of maxims being violated in spoken conversation (the 
"norm”), in non-verbal exchanges, in a purely graphic exchange, and last, in the con- 
text of mobile devices. We will assume that, in each scenario below, the Cooperative 
Principle holds. 

Spoken Conversation 

Scenario : My car is running out of petrol; I stop a passerby to ask where is the nearest 
petrol station. Here are examples of the maxims being violated. No explanations are 
provided as it should be obvious which maxim or sub-maxim is being violated: 

• (Maxim of) Quantity. 

- I: Excuse me! Hello?! 

- Stranger on the road: Yes? Can I help you? 

- I’m looking for a petrol station. Is there one nearby ? 

- Yes, just two blocks down this street; it’s a new one, just built last year and they 
have 7 pumps, two of which sell premium gas, and 1 which is for diesel. They 
don’t really sell much diesel, but the law requires them to carry it. 

• Quality : 

- I’m looking for a petrol station. Is there one nearby ? 

- No, you’ll have to drive to the next town. 

• Relation: 

- I’m looking for a petrol station. Is there one nearby? 

- That’s the problem with the world today! Too much reliance on fossil fuels! 
You should be walking; it’s good for you! 

• Manner : 

- I’m looking for a petrol station. Is there one nearby ? 

- Yes (then walks away) 
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Non-verbal Communication 

Scenario : You’re helping me to fix a broken door. Similarly, it should again be obvi- 
ous which maxim or sub-maxim is violated: 

• Quantity: I ask for two screws, you give me four. 

• Quality: I ask for a screwdriver, you pass me a wrench. 

• Relation: I’m hammering a nail; you pass me a good book to read. 

• Manner: I’m on top of a ladder and ask for 2 screws; you go to the kitchen for a 
glass of water then come back and give me the screws. 



Graphical Communication: 

Scenario: You see the all of following official road signs on the same lamp post as 
you drive by: 



Quantity: 






in this case, once you have the 



straight-only road sign, there is no need for the other two. 



Quality: 



• Relation: 



® 

NO U 
TURN 


1 U I 

TURN 

I I 





an obvious contradiction. 




kGIVE, 

.WAY# | NO ENTRY I 




the give-way sign is irrelevant given the no- 



• Manner: 



SPEED 




NO 


LIMIT 




PASSING 


25 




15 

MPH 


7:30 0:15 A M. 




SCHOOL 


2:30 3:15 PM. 




IN SESSION 




the speed limit is ambiguous. 



Mobile Device Interaction 

Scenario: I pull out my trusty PDA (personal digital assistant) and select the address 
book function, intending to call my friend, Gareth, at his office. Basically, I’m “ask- 
ing” for his office phone number, and this is what happens: 

• Quantity: I ask for his office phone number, the PDA returns his entire v-card, and 
I have to scroll to the bottom to get his phone number. 

• Quality: I ask for his office phone number, the PDA returns his home phone num- 
ber (because it’s sorted “h” before “o”). 

• Relation: I ask for his office phone number, the PDA (LAN-enabled, of course) 
pops up a message telling me I have email from Gareth. Right thing, wrong time. 

• Manner: I ask for his office phone number, the PDA reboots . . . 
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2.4 Examples of Breakdown and Repair 

Assume that each of the scenarios above takes place within the limits of the Coopera- 
tive Principle, then each of the examples above is a violation of the respective conver- 
sational maxims. Each example is also an illustration of breakdown in the communi- 
cation process. 

In many of the cases, repairing the breakdown is also straight forward. This should 
not be surprising since we claim that breakdown and subsequent (and often auto- 
matic) repair is a natural part of having a conversation. One could easily imagine 
being in a scenario similar to the one described of looking for the petrol station. If we 
do meet passerby’s who have proclivities towards violating conversational maxims, 
we can also imagine continuing the conversation after the breakdown occurs. For 
example in the first case (where the passerby violates the maxim of quantity), we 
could simply ignore the verbiage and continue with another question, e.g., “It’s 
10:00pm. Do you think they are still open?” 

The exception is, of course, when the passerby (or other party in general) violates 
the maxim of quality. Unless the reply is blatantly impossible or untrue, we would not 
be aware of a breakdown, and there would be no repair. 

Breakdown and repair over a longer exchange is also possible in a non-speech sce- 
nario. The examples above were chosen for simplicity. We have analyzed the card 
game of bridge where both the bidding sequence and the subsequent play of the cards 
are very much subject to the Cooperative Principle and part of the allure of the game 
is in trying to communicate the structure and strength of one’s bridge hand to one’s 
partner. In bridge competitions, verbal communication between partners is not al- 
lowed. This results in occasional misunderstandings (breakdowns) since the players 
are restricted to a limited bidding vocabulary to describe their position, but it is also 
interesting in how having the right shared context (bidding rules and conventions) 
allow the communication to occur, and breakdowns to be repaired. 



2.5 Mobile Device Interaction as a Conversation 

A full philosophical discussion of whether mobile device interactions are conversa- 
tions is out of the scope of this paper. It is sufficient that we can show that the idea is 
not ludicrous and that intuitively, Grice’s Cooperative Principle applies, and that the 
conversational maxims do capture the human user’s expectations when he or she 
interacts with the device. 

We will therefore list what, intuitively, are the characteristics of a conversation, 
and show that how we interact with a mobile device satisfies these characteristics. 

Characteristics of a Conversation 

From Grice, and with appeal to our own intuitions, we have the following characteris- 
tics: 

1 . Conversations take place between two or more parties. At least one of these parties 
should be human, conscious, and cognitively mature (this is to exclude talking in 
one’s sleep, the babbling of small children, etc.) 

2. Conversations result in an act of communication, i.e., that there is an information 
flow. The flow may be in one direction only. 
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3. Conversations must be cooperative. The parties involved want to maintain the 
conversation (until a logical end) and both will work towards a common goal. This 
distinguishes, for example, from command and control interactions. 

4. The event of a conversation need not be at a single time and place. This allows 
instant messaging, SMS’s and IRC chat to count as conversations. It also allows, 
for example, an advertising campaign to communicate a message that is distributed 
either or both temporally and geographically. 

5. The medium of a conversation is not important. We mostly think of interactive 
speech as conversation, but new modes such as SMS and internet chat are becom- 
ing pervasive. It is also true that in human-human communication, there are com- 
munication channels such as body posture, movements, gestures, eye contact, etc., 
which complement, supplement, and can often replace human speech. Thus, this 
last condition is a liberal one but does try to capture how conversations proceed. It 
allows alternatives such as art, advertising, human-computer conversation [6, and 
other papers in the same workshop series], etc., to count as conversations. 

There are many more (and different) characterizations of a conversation, but the ones 
listed are particularly relevant to the area of human- interface design. 

Mobile Device Interaction 

The first characteristic, that at least two parties are involved, is trivially true in our 
interactions with a mobile device. 

The second is not so straight forward. Many of our interactions with a mobile de- 
vice (e.g., calling a number on a cellular phone) do not seem to require an information 
flow; and would simply fall in the realm of command and control. However, in almost 
all cases, there is some form of a feedback and the ability to change, correct, or abort 
the interaction. For example, in calling a number, there is visual feedback of the nu- 
merals on the screen during the input phase and of the number in its entirety before 
we confirm and activate the call. The interesting thing is what comes next. In a con- 
versational setting, there is the concept of etiquette 5 in the flow of a conversation. 
This includes unwritten rules for barge in, turn taking, etc. All of these are violated 
when we receive a phone call. If we are in the middle of another activity, we have to 
apologize and excuse ourselves to answer the phone. From the caller’s point of view, 
would our actions be different if we knew the called party was in an important meet- 
ing? If yes, then it implies a breakdown in the communication act, i.e., it is consistent 
with the idea that calling somebody has the etiquette of a conversation. Repair is im- 
mediate in many cases; either the caller party asks “are you free to talk now?”, or the 
called party asks that you call back later. 

The third condition, which is that the two parties are in cooperation, can be recast 
in two directions. First, is the user cooperating with the device (or the device inter- 
face)? The answer is probably affirmative in that without cooperation (the user adapt- 
ing to the interface), it is unlikely the device can be used. Second, is the device coop- 
erating with the user, or rather does the user expect the device to cooperate with him? 
Reeves and Nash in [8] suggest that we unconsciously anthropomorphosize com- 
puters and other new media devices. This can be seen, for example, in how users are 



5 See [7] for example, and politeness is also a theme in the 3 rd International Workshop in Hu- 
man-Computer Conversation, 2000 
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frustrated when the system doesn’t do what we expect. That implies that, consciously 
or unconsciously, we expect the device to cooperate with us. 

The fourth condition, that a conversational event can be distributed over time and 
space is not an issue for user interaction with a device. This is trivially satisfied. 

The last condition, that the medium of communication is not important, is also 
trivially satisfied. 

We have shown that our intuitive list of conditions can be met by human interac- 
tion with mobile devices. We could also have added the condition that conversations 
take place over time, i.e,, to try and capture the intuition that a conversation between 
two people is a series of exchanges and not just a single exchange. The problem 
comes then in, first, fixing the lower bound for when it is no longer a conversation, 
and second, if it is not a conversation, then what is it? To keep things simple, we have 
removed this condition. 

Nevertheless, we should note that a human performing a task such as doing an 
internet search on a mobile device probably has more exchanges (interactions) than 
doing the equivalent task on a desktop computer. This is due primarily to the much 
smaller screen real estate of typical mobile devices (especially mobile phones) and the 
lower bandwidth channels of many of them (GPRS on a cellular device vs. LAN ac- 
cess on a desktop). For most mobile phones, there are the added limitations of query 
input mechanisms (no keyboards), manipulation mechanisms (limited scroll buttons 
vs. direct manipulation using a graphical interface), etc. Thus our interactions with a 
mobile device and a mobile phone in particular, need to be as efficient as possible. 



3 Conversational Design 

This section introduces the idea of conversational design. In particular, it suggests that 
we should use the Cooperative Principle to create interfaces which are as efficient as 
conversations. We look at the idea of designing for breakdown and repair, and look at 
a case study on the design of a mobile phone interface. 



3.1 What Is Conversational Design 

Conversational design simply means that we should design device interactions so that 
they proceed smoothly and efficiently just like conversations between two people. 
From our arguments earlier, we showed that human to device interaction counts as a 
conversation. As such, it should obey the Cooperative Principle. 

The cooperative principle suggests that there are two aspects to a shared context in 
a conversation. The first is the conventional shared domain of the conversation. This 
is the one that is leveraged by interface designers currently to streamline the interac- 
tion. For example [9, 10], if we know that a user desires to retrieve an address on his 
or her mobile phone, we can customize the menus to restrict the number of options 
and the order in which the options are presented to minimize the number of button 
presses to accomplish the task. Note that here we are optimizing for efficiency (num- 
ber of button presses) and not for, say, precision in retrieval, since the result should be 
binary (retrieved address or failed). 

The second (unconventional) aspect of the shared context is that which gives rise 
to Grice’s conversational maxims, and which is mostly ignored by traditional user 
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interface designers. This is that the two parties are sharing the same task, i.e., they are 
cooperating to accomplish the same task. In a non-cooperative command and control 
situation (party A tells party B (which could be a device)) what to do, there is no need 
for the device to “maintain” the conversation. All the communication in the interac- 
tion goes from A to B. In a shared task, however, both parties are committed to con- 
tinuing and maintaining the conversation until its logical end. Information flow 
(communication) is bi-directional; both parties advance the conversation until the task 
is accomplished. 

Related Work 

The idea of designing interfaces to act (or behave) as if in a conversation is not new. 
R.S. Nickerson proposed such a paradigm in his seminal paper in 1976 [11] where he 
explored the elements of conversation including rules for turn-taking, non-verbal 
communication, etc. Later work in conversational design can be broadly divided into 
three areas. The first area is from the speech perspective, i.e., to build computer sys- 
tems which interact with the users using speech recognition and text-to-speech tech- 
nologies. One of the more successful versions of these efforts was the Portico voice 
portal from General Magic, now existing only as General Motor’s OnStar service 

[12] . Portico carefully constructed its speech dialogues to take conversational cues 
and user expectations into account 6 . The second area is from the discipline of human 
computer interaction, where the extended interaction between human and computer is 
modeled as a dialogue, and hence a “conversation”. Much of the work is on how the 
computer system will react (or feedback) to the user in response to a command, .e.g, 

[13] , and ignores the human characteristics of a conversation. The third area continues 
in the direction started by Nickerson, and is epitomized by the work of Justine Cassell 
at MIT [14, 15]. Cassell’s work on building conversational interfaces relies on know- 
ing and understanding the various social and linguistic characteristics of conversation 
including turn-taking, feedback, repair, and generating and responding to verbal and 
non-verbal interactions. 

Conversational Design and Cooperation 

Our particular contribution to the notion of Conversational Design is therefore a para- 
digm in which the interface designer assumes that the user and the device will be 
cooperating in a conversation. As such both the user and the device have the right to 
expect certain behaviour of the other party, specifically, behaviour which obeys the 
conversational maxims. 

This has ramifications in both directions. For the interface designer, he can assume 
that the user is cooperative (predisposed to providing the right input) and create a 
system which does not assume an idiot user. One very successful example of this is 
the motorcar. The interface designer can assume that the driver has passed a driving 
test and knows the difference between the accelerator and the brake, and that he or 
she is competent to engage the cruise control, etc. In other words, that the driver and 
the vehicle are cooperating to accomplish the task of transportation. 

In the computing domain. Palm made that same assumption by providing only 
Graffiti as an input mechanism on its highly successful PDAs. They assumed that the 



6 From personal conversations with Dr. George White, formerly Director of Speech Technolo- 
gies at General Magic. 
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user will cooperate to learn the Graffiti writing system which allowed the device to 
recognize written input with a much higher accuracy than previous devices. This 
allowed the device resources to be concentrated on what the user wants to accom- 
plish, rather than on how to accomplish it. The various conversational maxims are 
epitomized in the early Palm Pilot devices. If you analyze the interface, you would be 
hard put to find many violations of the maxims. 

Coming back to the ramifications of mutual expectations, in the other direction, the 
user should be able to assume that the device is cooperating to accomplish the task as 
well. The device should not put up roadblocks or diversions in the path of the task, 
e.g., confirmation dialogues when it is what the user wants to do. So, contextual 
menus, adapative interfaces, etc., are all methods by which a device can cooperate 
with the user. What is important, however, is that the implementation of the contex- 
tual menus must satisfy the conversational maxims. For example, the menu items 
should not have too much or too little options (maxim of quantity), should not have 
false options (maxim of quality), should not have irrelevant options (maxim of rela- 
tion), and should not be ambiguous, inconsistent, or disordered (maxim of manner). 



3.2 Designing for Breakdown and Repair 

One of the characteristics of human conversation is the potential for breakdown. 
Breakdown is defined as a failure of the underlying communication act which is car- 
ried out in the course of a conversation. It is possible for both parties to converse and 
satisfy all the characteristics of a conversation and still be subject to breakdown. A 
simple example may be when two people are having an interesting conversation sup- 
posedly about a mutual acquaintance called Bob, but in fact it turns out that they are 
talking about two different Bob’s. 

Most of the time, however, the breakdown is either immediately obvious, or be- 
comes obvious over time, and steps are taken by the parties involved to repair the 
breakdown. For example, party A mentions Bob’s wife, and party B says, “Wait a 
minute! Bob isn’t married, is he?” thus leading to further exploration and the discov- 
ery that they were discussing two different people. This is often followed by a good 
laugh and a switch of topic. This nullifies the breakdown, repairs the conversation, 
and initiates a new communication act. In almost all cases, the repair is automatic and 
takes no conscious effort. 

In a similar way, user interfaces should share the same characteristics. A conversa- 
tion works efficiently (fast and fluently) because the parties involved create a hypo- 
thetical model of a shared context [16, 17] (e.g., the idea that they are talking about 
the same person named Bob). Each person’s model will be different depending on 
their perspective on the context (e.g., Bob’s childhood friend would have a very dif- 
ferent model of Bob than a current working colleague) but they necessarily cooperate 
to create their own model based on the elements of the conversation. The model con- 
tinues to evolve as the conversation progresses. You can say that a communication act 
has occurred when one party’s model is changed or augmented causally by the act of 
conversation with the other party. 

One common example of breakdown and repair in interface design is the Undo 
and Redo buttons in many interfaces. This is in contrast to the Confirmation 
Dialogue which pops up, often in the same interface as well. If you were using a 
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drawing package, you do not want a confirmation dialogue every time you performed 
an action (draw a box, whatever). If you make a mistake, you can repair either by an 
Undo command, or by erasing and starting over. In contrast, a popular operating 
system, by default, pops up a confirmation dialogue each time you wish to delete a 
file. This happens even though the file goes to a special holding area (a Recycle 
Bin) from which an Undo (or Undelete) command can be used to repair a mis- 
take. In which of the above cases would you consider that the system is cooperating 
with you to achieve your task (whether it is to draw a box or to delete a file)? 

So, the idea of designing for breakdown and repair is to focus on the task and make 
the interface cooperate to achieve that task most efficiently. We propose that checking 
the interface to ensure that they do not violate the conversational maxims would help 
considerably in creating such an efficient interface. 



4 Mobile Phone Design Case Study 

In this section, we present a case study on the user interface of a mobile phone. The 
intention is to identify real user interface problems with the mobile phone, and to use 
the conversational design paradigm to analyze the problems. 

The following case study follows from work done previously in [18]. The study 
was conducted in two phases. The first phase was to identify real problems users had 
with their mobile phone interfaces, and the second phase was an analysis of relevant 
problems using the conversational design methodology to suggest an alternative de- 
sign. 



4.1 Phase 1: Real Problems 

We found 12 users which had personally owned and used the same phone interface. In 
this case, there were 3 models of the phone made by the same European manufac- 
turer, all of which were based on the same platform (screen, buttons, controls), firm- 
ware, and hardware. There were external differences in the model including keyboard 
layout (one was unconventional), and only two of the models had built-in FM radios. 
The screen was 128x128 pixels capable of displaying 4096 colours. All of the sub- 
jects were well acquainted with the phone, being current users and owners or only 
recently having changed to a new phone. 

Demographically, we had ten male and two female subjects, ages ranged from late 
20’ s to mid 50’ s. All were local residents, but from various nationalities (Singapore, 
People’s Republic of China, India, etc.). Seven of the subjects were technically in- 
clined (computer science, engineering, etc.), the remainder ranged from housewife to 
marketing and sales. 

Each user who was a current user of the phone was asked to consider his or her use 
of the phone over the course of a day, and at the end of the day, to email a ranked list 
of problems with the user interface which annoyed the user. Those who had changed 
to a different phone were just asked to send a ranked list of their problems. We did 
not ask for explications of the problems, though several of the subjects provided that 
as well. 
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The lists were normalized as follows. Each top problem was awarded 5 points, the 
2 nd problem 4 points, and so on down. The fifth and subsequent problems were each 
awarded 1 point. The median number of problems reported was 3. Here are the results 
in descending order of points. Only problems with at least 3 points are listed: 

1. Difficult to switch on and off; button too small; have to press a long time 

2. Too many button presses for actions (several examples were given) 

3. Volume buttons too small 

4. No voice dialing 

5. Vibration mode too weak 

6. Difficult to find out details (phone no.) of received calls 

7. Very slow/difficult/too many button presses to delete individual SMS’s 

8. Cannot share information (pictures or ring tones) 

9. Inconsistent buttons for answering options (examples given) 

10. Cannot assign ring tones or pictures to particular numbers 

11. Synchronization software (for PC) not included with phone (have to download 
from website) 

12. No bluetooth 

13. Caller name sometimes doesn’t display even though caller number is in the ad- 
dress book 

14. Alarm sometimes cannot be turned off (alarm is part of calendar function) 

15. Not enough memory on the phone 

16. Unusable keypad (relevant only to one of the models) 

17. FM radio needs headphones plugged in to be used even with loudspeaker 

18. Screen gets dirty very easily; have to take apart to clean 



4.2 Phase II: Conversational Design Analysis 

Most of the problems listed above are not issues of user interface design; but generi- 
cally phone design. So, filtering away the poor hardware design (e.g., small buttons) 
or limitations (e.g., weak vibrations, lack of memory) and the features (or lack of 
them, e.g., voice dialing), we are left with the following: 

1. Too many button presses for actions (several examples were given) 

2. Difficult to find out details (phone no.) of received calls 

3. Very slow/difficult/too many button presses to delete individual SMS’s 

4. Inconsistent buttons for answering options (examples given) 

The first three problems above seem to be very similar, but have different solutions 
from a conversational design perspective. 

Too Many Button Presses for Actions 

Here, the problem is that the device has a reconfigurable button (labels change to suit 
the activity) which is either an action (select or open) or else option. However, 
in a few cases, the first selection in the option list is to open. The interface design 
philosophy seems to by default: provide an option list to select; if there is only one 
action in the option list then replace the option list with the action. E.g,, the default 
interface is click option -> click open to open a folder. 

This immediately violates the maxim of quantity. Most of the time, the user ex- 
pects to open the folder; so that should be the default. Providing all the options is 
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giving unnecessary information. The interface could be redesigned to be click 
open -> click options only if desired. 

This could also be construed as a breadth-first (show all the actions first) vs. a 
depth-first (show all the objects for the most common action first) display of the 
menus. Conversational design suggests that in a task-driven (i.e., user) context, depth- 
first will be the appropriate model. Breadth-first would be appropriate only if viewing 
the options (e.g., exploring the interface) was the task. 

Difficult to Find Details (Phone No.) of Received Calls 

To find out the details of received calls, the user has to click menu - ) Call Regis - 
ter select -> Received Calls to get to the list of calls. In contrast, to find 
out details of dialed (outgoing) calls, the user only has to click the Answer button. 
This is a violation of the maxim of manner; specifically that unnecessary prolixity 
should be avoided. 

One subject who had rated this as his most important problem clarified that as he 
was in sales, he would be in many meetings and would receive calls he would answer 
but which he could not process immediately, and would promise to return the call. 
Hence, he wanted an easy way to list called numbers so that he could call them back. 
When asked, he also indicated that he used the list of dialed calls very often as well to 
call clients again if they were busy. 

However, the phone only has one Answer button and no spare buttons available 
for a 1 -click access to received calls. We examined other makes of phones to see if 
this problem was there as well, or if there was a solution. We found that another 
manufacturer merges the list of received and dialed calls together, sorted by time, 
which is immediately displayed when the Answer button is pressed. If differentiated 
lists are desired, then the user can retrieve them individually through the long-winded 
menu path. This seemed like an elegant solution. There was no ambiguity (violation 
of the maxim of manner again) because each call was prefaced with an icon denoting 
either a received or a dialed call. 

Very Slow/Difficult/Too Many Button Presses to Delete Individual SMS’s 

The phone has a single command to delete all SMS’s in the inbox via menu -> Mes - 
sages select text messages select delete messages 
all messages. While this sounds rather long, it can be done in 7 clicks. 

However, if you wanted to delete specific SMS’s only and to leave others un- 
touched, the sequence becomes menu -> Messages -> select -> text mes- 
sages -> inbox -> select (select and open a specific SMS by sender name) -> 
options -> Delete -> OK (confirm dialogue) before returning to the inbox. This 
is a total of 9 clicks for one SMS, plus a further 4 clicks more for each additional 
SMS. The reality is worse as the phone is slow to retrieve and open each SMS. 

This sequence violates the maxims of quantity (too much information returned to 
delete an SMS) and quality (asking for confirmation when there is no evidence that 
confirmation is desired). If we design for conversational efficiency (i.e., catering for 
breakdown and repair) then we should drop the confirmation dialogue and substitute 
it with an option for undo. This removes the violation of the maxim of quality. For 
the maxim of quantity, we notice that, while in the inbox, the Answer button is un- 
used, and propose the following change. The current select function could be done by 
clicking the Answer button, and the current select button be changed to option, 
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one of which is to delete message. This makes the entire branch of this menu 
one level shorter. 

To delete a single SMS, the sequence now becomes menu -> Messages -> se- 
lect text messages -> inbox -> options -> delete which is 7 clicks 
(reduced by 22%) with each additional SMS requiring only 2 clicks (reduced by 
50%). So, for example, to delete five specific SMS’s, the original interface required 
25 clicks plus the waiting time to retrieve and open the 5 messages. The proposed 
method requires only 15 clicks (reduced by 40%) plus no waiting time. 

Inconsistent Buttons for Answering Options 

The phone device in question has a built in loudspeaker. In addition to the Answer 
and Hang -up buttons, there are two buttons which we will call Left and Right 
which are located above the Answer and Hang -up buttons respectively. In the 
analyses above, the Select and Option buttons is always the Left button, and 
the Right button is reserved for the back (or return to previous screen) function. 

The problem comes when answering a call. If the Answer button has not yet been 
clicked, then clicking Left/option -> loudspeaker (2 clicks on the Left 
button) both answers the call and activates the loudspeaker. The loudspeaker option is 
at the very top of the list. Clicking Right/ Silence turns off the ring tone but does 
not answer the call. Clicking Right again rejects the call. 

However, if the Answer button has already been clicked, then clicking 
Left/option provides various other options but now with the loudspeaker option 
at the very bottom of the list; you have to click Right /loudspeaker to activate 
the loudspeaker. 

This is a violation of the maxim of manner (avoid ambiguity). If it sounds confus- 
ing to describe above, then the inconsistent interface also makes it difficult to use. We 
checked with one of the subjects who had listed this as a problem and he said that he 
uses the loudspeaker function as a substitute for a hands-free earpiece while he’s 
driving. He is however used to clicking Right/loudspeaker to activate the loud- 
speaker that he often clicks Right/silence->rej ect (2 clicks) rather than 
Lef t^loudspeaker (also 2 clicks) when he tries to answer the call on the loud- 
speaker, thus ending up rejecting the call. 

Conversational design tells us to follow user expectations, in this case, to be con- 
sistent, so we suggest the interface should be changed so that the Left and Right 
button functions are swapped when the Answer button has not yet been clicked on an 
incoming call, i.e., clicking twice on the Right button results in answering the call 
and activating the loudspeaker. The problem is that there are now two ways of an- 
swering the call and there is the possibility that a call is rejected rather than answered 
if a finger slips since the Hang - up button is right below the Right button. 

A better way may be to remove the Right/loudspeaker function to avoid the ambi- 
guity of having two ways of answering a call, and just make Lef t^ loudspeaker 
(2 clicks) a consistent option regardless of whether Answer has been clicked or not. 



5 Mobile Information Access 

There are many aspects of mobile information access. This includes issues such as the 
following: 
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• how to enter information, e.g.. Blackberry style keyboards, handwriting recogni- 
tion, speech, etc., 

• how information is output on the mobile device, whether on a limited screen or 
through speech or any other modality. How to compress results, or summarize, or 
improve retrieval precision, 

• how to navigate within the device, and within applications, and how to add new 
functionality onto the device. For example, if you download a java application to a 
java-enabled mobile device, you also import that application’s user interface which 
is independent of the user interface of the inbuilt mobile device functionality. 
There are issues of efficiency and robustness, ease of use, etc. 

• how to leverage the mobility aspect, leading to work in location based services and 
equivalent. 

All of these, while valid areas of research, can be tritely summed up as getting the 
right information at the right time in the right form. Thus if I were driving a car, and I 
wanted to get to someplace new to me, there are many ways that information could be 
provided: 

• if the information was a map of the area, I would have the right information, but 
neither at the right time (which would be before I started driving) or in the right 
form (it’s not safe to use the map while driving), 

• if the information was a set of instructions to follow on a website, I would have the 
right information and in the right form, but not at the right time, 

• ideally, I want to be told when to turn left or right, with sufficient notice to slow 
down or filter to the correct side of the road, etc. And the instructions should be a 
pleasant clear speaking voice which I can hear without having to take my eyes off 
the road. 

All this is not new and there are car navigation systems that attempt to do precisely 
what I mentioned. But currently, each application has its own interpretation of what is 
the right information at the right time and in the right form, and none of them take 
into account the situation of the user, i.e., the user has to learn or adapt to the naviga- 
tion system. There are no “rules” to decide if an output or demand for input is right or 
wrong, or even what it means to be “right” or “wrong”. 

Conversational Design, following Grice, is an attempt to codify what it means to 
be right, and what kind of questions we need to ask to ensure that, subjectively, as in a 
conversation, the system provides the user the information that meets his needs, both 
at the explicit (lexical content) level and at the implicit (structural) level. This was 
illustrated in the previous section. 

Unlike similar work based on conversations [e.g., 19, 20], Conversational Design 
does not try to implement conversational behaviour in systems. Rather it is a design 
paradigm that can be integrated into a design workflow to test the interface design. 
Here are two possible implementation strategies based on the standard ethnographic- 
type design approaches 7 : 



7 There are many flavours of ethnographic design, but essentially they have the same guiding 
principles, and key design activities. For innovation design, see [21], More generically, see 
The Better Product Design website, at http://www.betterproductdesign.net/. 
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• Before designing an interactive information access system (mobile or otherwise), 
put a human intermediary between the system and the user. Assume that the inter- 
mediary is totally dedicated to that single user (as a personal mobile device would 
be). Observe the interaction between the intermediary and the user; analyze and 
replicate on the final product. 

• In the course of a normal ethnographic design process, after the qualitative phase 
and before the quantitative phase, insert an extra step which uses Conversational 
Design as a test suite for the mobile device interface. 

Of course, it also helps if the interface designer is familiar with Conversational De- 
sign; as a paradigm, Grice’s maxims should be a constant watchdog on the design 
process. 

6 Concluding Remarks 

In this paper we have borrowed the Cooperative Principle and the conversational 
implicatures from Grice’s Logic and Conversation. We have applied it to the idea of 
user interface design, specifically for mobile devices and introduced the idea of con- 
versational design, and of designing to cater for breakdown and repair during efficient 
conversations. We presented a case study on mobile phone user interface and showed 
where conversational design can provide a new perspective on interface design and 
how such an analysis could help create better interfaces for mobile devices. 

Future work for conversational design involves experimental validation. We had 
performed a few simple experiments based on the changes in user interface mentioned 
above. None of the subjects made any errors using either the original or the proposed 
interface. As such, no significant conclusions could be determined. If we measured 
the experiments by efficiency (i.e., by the number of button clicks), then the conversa- 
tional design interface would be much more efficient, but since the experiments were 
simulated on a laptop rather than “live” on actual mobile phones, the results have no 
particular validity. 
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Abstract. This paper examines user interface issues in mobile services. Experi- 
ences from the development work of a mobile connectivity service are com- 
pared to published recommendations for small interface design. It is concluded 
that for a multi-channel mobile service, it is crucial to provide similar content 
with different access methods. By designing applications to enable easy one- 
handed navigation, applications can be kept simple enough to ensure that multi- 
channel delivery - porting to different environments, screen sizes and devices - 
does not require unreasonable effort. 



1 Introduction 

This paper examines software user interface issues in mobile communication applica- 
tions, with a special emphasis on mobile office solutions. Findings from a literature 
review on the subject are compared to experiences from the development work of a 
mobile connectivity service. 

Mobile telephones have been a success in the mobile market, establishing wireless 
phone calls and short messaging through SMS as a means of communication. In addi- 
tion to using fixed-line phone calls and e-mail, more and more people are moving to 
mobile communication. Mobile e-mail is predicted to be one of the next big things in 
mobility, and signs of its business potential have already been seen in Japan where 
mobile Internet has made its breakthrough with tens of millions of users. 

Mobile communication applications may be used with devices like mobile phones, 
personal digital assistants (PDAs), or pagers. Some of the devices enable wireless 
communication with other devices or with servers through some built-in software and 
over a protocol like SMS, WAP or HTML. 

Typically mobile communication applications are used by people “on the go”, 
meaning that the users do not reserve separate time to use an application, but use it as 
they are simultaneously doing something else. The devices the applications are used 
on have size, interaction, and processing power limitations, but despite the limitations, 
they do however offer some advantages over desktop computers, like portability and 
instant access to time-critical information. 

Usability research on “large” interfaces like desktop computers is an established 
practice, and various design guidelines for this kind of applications exist. It is how- 
ever not obvious that all of these widely recognized design principles apply as such 
for the design of small interfaces [10,11]. Only in recent years has the rise of mobile 
phones increased the research effort invested in small interface design, and guidelines 
for small interfaces have started to emerge. 
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1.1 Comparison of Mobile Applications and Desktop Applications 

Mobile applications differ in various ways from their desktop counterparts. Along 
with the characteristics of mobile devices and the connecting network come certain 
limitations [10,14,18]: 

• Low computational power, small memory and cache, and usually no mass storage 
devices like hard disks. 

• Small display size, and a lot of variation in display dimensions. 

• Restricted color display - e.g. for mobile phones, the number of color displays has 
only recently started to grow. 

• Limited fonts and text size. 

• Restricted input methods make text input slower than on a full PC keyboard. 

• Often there is no pointing device for activating objects, which limits the possible 
user interface components and slows down object activation. 

• Some devices support only vertical scrolling 

• Network connections to handhelds have low bandwidths and are considerably 
unstable, 

• Handheld operating systems do not offer the same variety of services as desktop 
operating systems. For instance, many operating systems do not support threads or 
processes for background tasks, a common technique for desktop computer appli- 
cations. 

Mobile applications follow a different usage paradigm than desktop applications: 
they are designed for a small display, have to provide short start-up and response 
times and are developed for gathering and presenting small pieces of information 
rather than processing large amounts of data [14]. Mobile users can access the mobile 
Internet or application at any time and anywhere, e.g. to kill short periods of time 
when they are not busy with something else. They play games, check their e-mail, or 
read the daily news headlines e.g. while they wait for an appointment. 

Users also often access the applications while doing something else, either to help 
performing another activity, or completely unrelated to the other activity. Therefore 
they expect the services to be accessed easily by clicking a few buttons. Weiss [19] 
calls this approach “hunting” for information, as compared to “surfing” on the desk- 
top web. 

The initial position for designing a mobile application is very different from that 
for designing a desktop application. Design issues that specifically challenge a mobile 
application designer include the following [1,10,18]: 

• Information visualization, due to the small screen. 

• Information navigation, i.e. finding the path and actions necessary to find a piece 
of information on a site, and getting back when needed. 

• Interaction constraints. For instance, requiring the use of both hands to operate a 
device when standing in a bus may not be a good idea. Ideally devices and ser- 
vices should enable easy one-handed use. 

• Context of use. The context of use is harder to predict than with an office PC ap- 
plication. Since mobile devices rarely have the capabilities of stationary com- 
puters, they are not likely to be the complete solution to the user’s problems. In- 
stead, they are more of a support in activities, where, ideally, the user’s main 
focus is on the activity taking place rather than on the technology supporting it. 
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• Access speed. The users often fill short gaps of unproductive time with mobile 
applications. Therefore the possibility to access pertinent information quickly is 
crucial. 

• Cost. The user may have to pay for each piece of data transferred over the net- 
work. 

Besides limitations, mobile applications provide unique opportunities for their con- 
tent [18]: 

• Personalization. The content of applications can be personalized, and content and 
services can be billed e.g. via mobile phones. A mobile application can, for exam- 
ple, allow users to purchase public transportation tickets electronically via their 
mobile phones. 

• Location-sensitive services. A mobile phone can be used both independently 
(anywhere) and depending on its location. For instance, making phone calls is 
normally location-independent, but routing information for public transportation is 
location-dependent. 

• Timeliness of content. Mobile service users can access content precisely when they 
need it, and can receive and retrieve timely information. For example, mobile ser- 
vices can employ alerts for last minute concert ticket sales or up-to-the-minute 
stock-trading information. 

According to Wallace et al. [18], the most successful mobile services try to use at 
least two - if not all - of the above listed characteristics. 



2 Guidelines for Mobile Application and Service Design 

As noted above, it is clear that for the development of user-friendly small interface 
services and applications on mobile devices, a revision of guidelines originally meant 
for large displays and interfaces like PCs is necessary. An overview to existing guide- 
lines for small interface mobile devices in the literature is presented in this section. 
Guidelines that apply to small mobile interfaces in general are presented, followed by 
a collection of guidelines more closely addressing the mobile content and navigation. 

General design guidelines for mobile devices include the following: 

• Design for users on the go. The design for mobile devices must include context 
and forgiveness [19], and provide time-critical information [15]. 

• Enable fast use. Two major considerations for the users of a mobile service are the 
cost of access and the speed of downloading content [18]. Many users are paying 
for mobile services by the minute, so if they cannot get the information they are 
looking for within a short period of time they will stop using the service [12,17]. 

• Keep it simple. The old adages about keeping a system simple stupid and about 
"less being more” certainly apply for mobile devices and services. For instance, 
the most successful PDA devices do not attempt to replace the PC, but to com- 
plement the PC use, and the use of some other traditional tools [13]. 

• Provide feedback and navigation cues. It should be obvious what the application 
is, and how one can navigate from the page [6,19]. 
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• Include self-recovering capabilities. Even if the network goes down, the service or 
application need not [13,19]. There should be means to restore the values or writ- 
ten text, or to have them restored automatically. 

Content design guidelines for mobile devices include the following: 

• Present the most important content first. The most important content should ap- 
pear at the top of the page [2,7,13,15,19]. 

• Keep content compact. It is recommended to keep the pages short [2,7,9,10,12,13]. 

• Don 't make the page layout complicated. It is recommended to keep pages simple 
and task-oriented, possibly text only, and to avoid elements that don’t add direct 
value to the content [2,9,12,13]. 

• Use simple text elements and styles. The elements used in text layout should be 
clear and simple [2,12,18,19]. 

• Pay attention to page titles. It is important that the page title elements are descrip- 
tive, since they enable bookmarking and knowing where one is [10,15,17]. The ti- 
tles should however be short, preferably less than 15 characters [12,13]. 

• Keep documents small. Because there are various memory restrictions in mobile 
devices, the documents should be kept as small as possible [12,18]. 

• Use compact link names. Long linked text can make a page difficult to read and 
time consuming to scroll. It is recommended to use only one or two words as the 
title of the link [18,19], 

• Design clear forms. Forms should not be too long [10]. A clear way to cancel the 
form filling and for going back should be provided, but attention should be paid to 
form resets, since on small devices, forms are laborious to refill if all values are 
reset by accident [18]. 

• Use smart graphics. If graphics are used at all on small devices, they should be 
made informative, small and simple [13], 

Navigation design guidelines for mobile devices include the following: 

• Minimize steps in navigation. With small screen devices, it is very important to 
design for economy of navigation [2,6,10,15,18]. Users will be frustrated by 
scrolling through long lists of options, filling out complex search forms, and see- 
ing needless pages along the navigation path. 

• Selecting instead of typing. It is recommended to consider whether it is possible to 
ask the user to choose from a default list using select lists, checkboxes or radio 
buttons rather than typing in a selection [2,12,13,17,18,19]. Alternatively one can 
offer a default list together with an input box. 

• Keep the navigation consistent throughout the sendee. The way in which a user 
makes his or her way through the pages that constitute a service, interacting via 
links, menus and data input should be kept consistent throughout the service 
[12,19]. 

• Design flat menus. It is recommended to keep menus flat, because it is often diffi- 
cult to form an overview of a service containing too many layers, and because a 
deep hierarchy makes the use more difficult [2,12,15,19]. 

• Cross link. The Back functionality is the most important way to go back. However, 
when users need to go back several levels, links to the starting page and subsec- 
tion main pages are useful [10,12,15,19], A simple tree design is efficient, but the 
deeper the navigational hierarchy gets, the more necessary it becomes to get back 
to the starting point, and also to other pages. 
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• Provide confirmations for important actions. Confirmations must be there for 
actions like changing important values or deleting items. Even though the user 
needs to click OK on the confirmation page, that requires much less effort than 
e.g. returning to a list to check if an item was really removed [10]. 

• Searching should be intuitive. Searching should be a step-by-step, logical process 
[15]. Once the search is performed, the results must be easy to scan, and the in- 
formation should enable making good, informed choices within the results 
[6,10,15]. 

3 Experiences from Development Work 

This section presents usability-related experiences from the development work of the 
SMS, WAP, Web and Voice accesses to corporate information provided in the Nokia 
One Mobile Connectivity Service. Guidelines presented in the previous section have 
been used to make design decisions and for evaluations during the various stages of 
development work with Nokia One applications. The guidelines have proven useful in 
development iterations. 



3.1 Presentation of the Service 

The Nokia One Mobile Connectivity Service is an application service that provides 
access to corporate e-mail, calendar and directory information from a GSM phone, a 
PDA, a PC or a fixed-line phone. The service enables sending and receiving e-mail, 
scheduling meetings and appointments and accessing corporate directories, e.g. while 
traveling or out of the office. It is targeted for business users. Out of the three charac- 
teristics that Wallace et al. [18] relate to successful mobile services, Nokia One ap- 
plies two and leaves one out: it has personal information and timeliness, but is inde- 
pendent of location as it provides the same information to all locations. 

Access Methods and Applications. The Nokia One service has four different access 
methods based on the SMS, WAP, Voice and Web protocols. Table 1 presents the 
applications provided with each access method at the time of writing. For Web access, 
large screen (PC) versions of e-mail and calendar exist, but as this paper concentrates 
on small interfaces, they are left out of the table. 

Table 1 . Nokia One applications by access method at the time of writing. Large screen e-mail 
and calendar are left out, as they are not within the scope here. 







Access 

method 


E-mail 


Calendar 


Directory in- 
formation 


SMS 


Yes 


Yes 


Yes 


WAP 


Yes 


Yes 


Yes 


Web 


j Underdevelopment j 


Voice 


Yes 


Yes 


No j 



The applications in each access method are presented in more detail below. 
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The SMS access. The SMS access is based on sending short commands like "m" (for 
mail), "c" (for calendar), or "find" followed by a name (for the directory service) to a 
service number, which sends shortly a response. The responses come usually in the 
form of numbered lists, which then enable viewing items and navigating between 
them. If multiple items are presented in a list, items can be viewed by sending the 
number of the item (e.g. "1" for the first e-mail message, calendar event or directory 
service item). The items can be e-mail messages, calendar events, or items found from 
the directory service. Figures 1, 2, and 3 present examples of SMS commands sent to 
the service through SMS and responses given by the service. 



<aabc 


159 


12400 


M 




113 






1 .Jonathan De: 






Cultural issues in 






2.ext Maria S: RE: 


0 [toons 


Clear 


O [toons Clear 



Fig. 1. An example of e-mail use through SMS. On the left, a request for new mail, and on the 
right, a response that shows that there are three SMS pages of message headers, out of which 
the first one is displayed. More of the response message is to be found by scrolling down. By 
sending the number of a message (e.g. "1" for the first message), the user can read the message 
content. 



'fcabc 


159 


C 




Options 


Clear 



12400 

1. Mon 25 9:15AM 
IT seminar 

2. Tue 26 2:00PM 

research review 
Options Back 



Fig. 2. An example of calendar use through SMS. On the left, a request for calendar events in 
the near future, and on the right, a response displaying a list of two events. 



148 12400 

Find Simon T 1 .Simons Tina (HO 

Communications) 

2.Simonsen Tony 
(Research Center) 

O|toons Clear 0 [toons Back 

Fig. 3. An example of directory service use through SMS. On the left, a request for information 
on people whose names match the input, and on the right, a response displaying two people 
who match the criteria. 

The item is split into several SMS messages in case the length of the retrieved item 
is more than the SMS length supported by the GSM phone in question. This is indi- 
cated by displaying a "page count" in the beginning of the message (see the response 
message in Fig. 1). Moving to the following page is enabled by sending an empty 
message, or a message containing just a space character. 

Also several other e-mail functions are supported by the SMS access. Possibilities 
for e.g. sending, replying to, and forwarding e-mail, as well as receiving notifications 
of arriving messages and browsing older messages and messages in other folders 
besides inbox exist. The calendar application enables also e.g. browsing time periods 
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selected by the user, adding calendar events, and using a mobile phone’s calendar 
together with the service. In addition to the name search, the directory service sup- 
ports also searching by phone numbers and business units. 

The WAP access. The WAP access provides interfaces for e-mail, calendar and corpo- 
rate directory. The navigation is based on links, and is thus more intuitive to most 
users than the “command line” type of interaction in SMS. A starting page provides 
access to all applications, and to WAP settings that affect the WAP browsing. The 
applications are also cross linked with WAP’s Options menu, so that returning to the 
starting page is not obligatory for moving between the applications. Moving up in the 
navigation hierarchy is made easier by providing links to one level up, and to the 
starting page at the bottom of each page. 

The WAP e-mail application enables navigating within and between e-mail mes- 
sages and folders, sending, replying to, and forwarding e-mail, searching and sorting 
messages, and viewing attachment files. E-mail in folders is divided to unread (new) 
and read (old) messages. If one of these links is selected, the user gets to an e-mail 
list. The list is divided into five message headers per WAP page. When the user se- 
lects a header of an e-mail message, the message in question is opened. If the message 
is long, it is divided into two or more pages. The next part of the message can be 
reached by selecting the link More. Examples of a WAP e-mail list and message 
screens are presented in Fig. 4. 



a inbox i pp. 

1. JonaUian Pern.'... 
—Cultural issues 
iililia Siluer 
-RE: Booh ref 

Options Back 



a Message 

I don't have a copy 
myself, but I'm sure 
Joanne can lend hers 

fttm 

Options Back 



Fig. 4. Examples of a WAP e-mail list and message screens. On the left, the list, and on the 
right, the message. 

The WAP calendar application enables listing calendar events by day, week, or 
month, viewing, searching and editing them, creating new events, and requesting 
events to be sent to the phone as vCalendar notes. Calendar event lists are divided to 
five events per WAP page. When the user selects a header of an event, the event in 
question is opened. If the message is long, it is divided into two or more pages, simi- 
larly as e-mail messages. Examples of a WAP calendar event list and event detail 
screens are presented in Fig. 5. 



a WeeK 22 


i Details 


08/25/03-08/3 1/2003 


Subject: Research 


f1M>n 08/25 9:15/^1 


review 


-IT seminar (AB 605) 


Location: B606 


f2YTue 08/26 2:00 PM 


Description: Let's see 


Options Back 


Options Back 



Fig. 5. Examples of a WAP calendar event list and event detail screens. On the left, the list, and 
on the right, the event details. 
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The WAP directory application enables searching contacts from the corporate di- 
rectory, viewing contact details, and saving them on the phone as business cards 
(vCards). Examples of a WAP directory search response list and contact detail screens 
are presented in Fig. 6. 



fi $ items 


i Details 


1. Simons Tina tHO 


Tina Simons 


Communications) 


Communications 


2. Simonson Tony 


Officer 


tResearch Center) 


HO Communications 


Options Back 


O|bons Bait 



Fig. 6. Examples of a WAP directory search response list and contact detail screens. On the 
left, the list, and on the right, the contact details. 

The Voice access. The voice access provides an interface to e-mail and calendar. It is 
used by calling a service number, where a speech synthesizer reads out the e-mail 
messages or calendar events that the user requests to hear. The navigation is carried 
out through the speech engine providing guiding prompts suggesting what the user 
may want to do next. The voice access includes two alternatives for commanding, 
speech commands that the user speaks out, and DTMF keypad commands that the 
user gives on a phone’s keypad. The speech and DTMF interfaces provide the same 
commands. In addition to listening to messages or events, the voice interface enables 
replying to e-mail messages by recording a voice reply file (in WAV format) that is 
sent with the message as an attachment file. 

The following is an example of a possible excerpt from e-mail use via voice ac- 
cess: 

[Speech synthesizer] "... Message one from John White at Nokia dot com, sub- 
ject project meeting. Say read message, next header, previous header, or good- 
bye. ” 

[User] "Read message. " 

[Speech synthesizer] "Reading message number one. Press zero to interrupt at 
any time. Hello all, 1 think we should continue our..." 

The interaction with the voice access to calendar is similar, for instance: 

[Speech synthesizer] "... The appointment is at 11 AM and it’s about confer- 
ence call. Say give details, browse calendar or goodbye. ” 

[User] ‘‘Give details.” 

[Speech synthesizer] “The appointment is today at 11 AM and it lasts one hour. 

It’s at E727 and it’s about conference call. Here is a more detailed descrip- 
tion. . . ” 

The Web access. The Web access provides an interface to e-mail, calendar and corpo- 
rate directory. At the time of writing, the small screen versions of the applications 
were under development, and thus they are not presented in detail here. As the large 
screen HTML browser applications are not within the scope of this paper, they are not 
presented here, although large screen (PC) versions of e-mail and calendar are fully 
functional. 
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The functionality for the small screen browser applications will resemble closely 
that of WAP applications, but as HTML/XHTML enables the use of more advanced 
formatting and use of graphical elements like icons, some views are completely redes- 
igned to provide more value to the user. For instance, the week and month views of 
the calendar application benefit from the use of tables to present the time periods in a 
way users are used to see them in other calendar applications, and view selection 
between week, day and month views can apply icons that help users in quickly recog- 
nizing what the views are about. 



3.2 Experiences 

This section presents experiences learned in the development of the Nokia One ser- 
vice. Experiences have been gathered from end users through spontaneous e-mail 
feedback and through user studies, from customer meetings where end users have 
been present, and from development work. Performed user studies include 3 interview 
studies with 16, 3 and 4 participants respectively, and one usability test with 3 partici- 
pants. The studies have involved users from three different companies. 

The objectives of the user studies were to gather user needs and feedback from 
Nokia One users, and to gather information about current usage methods and the con- 
text of use. The intention was to get rapid, grounded input for development work. 
Business users with different profiles were selected from client companies based on 
their work profile, Nokia One use experience and their availability at the time of the 
studies. 

In the first study, the aim was to cover the SMS and WAP accesses, and to get 
feedback from long-term users by conducting semi- structured interviews. 16 users 
from two different companies were interviewed. 8 of the users worked for one of the 
two companies, 8 for the other one. In one company, the users had in average 5 
months of experience in using the service, while in the other company the average 
experience was 1.4 years. However, little information was obtained of WAP use, and 
thus another study was conducted to cover WAP use specifically. 

In the second study, the research method applied was field usability testing, which 
included an interview and performing test tasks with the WAP interface to e-mail. 3 
Nokia One WAP users participated in the study. All of them worked for the same 
company. Their experience of the Nokia One WAP e-mail use ranged from 2 weeks 
to 4 months. 

A third study was conducted to cover the Voice access. 3 Nokia One Voice users 
from the same company, with 2 to 3 months of use experience, participated in semi- 
structured interviews. 

In order to ground the design of the WAP calendar application in user data, a focus 
group session with 3 Nokia One SMS calendar users was held. In addition to the fo- 
cus group session, one “power user” of the SMS calendar application participated in a 
single semi- structured interview. The focus group participants had used the SMS 
calendar for 2 to 3 months, and the power user for 9 months. 

The most important findings from the studies are presented in this section, along 
with experiences from other development work. As the studies had similar objectives, 
and as important lessons have been learned also outside them, all the results and ex- 
periences are presented together, and not separated by study or source. 
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Mobile Applications in General. We have found that for a mobile service, it is bene- 
ficial to provide the same applications on various access methods and for various 
devices. This provides flexible access and minimizes the “gulf’ between devices, 
while it also helps leverage demand for existing services not yet available on new 
devices, as once a mobile service gives access to some of the PC world's functional- 
ity, users quickly start to expect also other functionality familiar from the large dis- 
play and fixed-line connection. A mobile service can nicely complement the use of a 
PC application, if the use of the service is fast and easy enough. For instance, a mo- 
bile service that "gets to the point" fast can reduce the need for establishing a laptop 
connection. If a mobile service is easy, fast and efficient to use, users can and will use 
it often, even during very short breaks. Users can get "hooked" to the service - in a 
positive sense. Easy authentication is an important part in creating a feeling of fast- 
ness and efficiency. 

Flexibility of use is important. Users like it that there are several ways to use a ser- 
vice. Enabling users to switch between different access methods easily and efficiently, 
without losing the thread, is important. Moreover, multi-device support is crucial. 
Users, and especially large companies, don't want to buy many devices to be able to 
use mobile services. Once different devices are supported, tailoring the content for 
different devices is appreciated, as users get content optimized for their device. 

Different levels of information should be available on a mobile device. As recom- 
mended in several design guidelines [2,13,15,19], the most important information 
should be presented first, but more detailed or less important information should also 
be available. The default values for all service settings must be appropriate, but low 
effort for user-initiated customization is appreciated by those who want to change the 
settings. The service interface should enable customization in the same application 
that is affected by the settings. If the settings are placed outside the application, the 
users will not change them. For an SMS interface that cannot intuitively present the 
settings, a credit card size quick reference card has turned out to be an efficient aid. 

We have experienced that navigation is crucial for the user experience. This is not 
surprising, as the importance of navigation is heavily emphasized in literature 
[2,6,10,12,13,15,17,18,19]. Being able to tell how to get to where one wants to go and 
to accomplish what one wants to do, being able to tell where one is, distinguishing the 
device's in built features from those provided by the service, and removing unneces- 
sary steps from navigation were noted as important. 

Moreover, we have found confirmations of important actions to be valuable, and 
that progress indicators are appreciated when actions take long. 

Application Specific Remarks. The following remarks were made about specific 
applications. 

E-mail. Mobile e-mail users were found to primarily read their e-mail, and only sec- 
ondarily take any action, like replying to the message. Many users just want to know 
if they have new e-mail or not. With WAP and Voice, users read longer messages 
than through SMS, and with WAP they write more than through SMS. Some users 
use the voice reply functionality in Voice e-mail. 

Automatic notifications of new e-mail as SMSs are popular. Together with the fact 
that the mobile phone is almost always on the user, automatic notifications enable 
users to react to e-mail messages in real time. This enables users to choose if they 
want to be active in checking their e-mail themselves, or if they prefer the system to 
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tell them when new e-mail arrives. Automatic notifications however bring with them 
the need for filtering, as many business users receive huge amounts of e-mail. 

Calendar. Mobile calendar users are mostly interested in quickly checking the events 
in the near future, especially their time and location. Viewing the current day’s events 
is the most important function of the calendar, and viewing the current week’s coming 
events the second most important one. Mobile calendar users appreciated the most the 
fact that their calendar was online, without a separate need to synchronize it. 

Moving events ahead is the most often occurring action on the calendar events that 
are already entered in the calendar. There is little need to change the contents of an 
event, and the past is viewed very seldom. Most users use alarms to remind of events. 
The most often used alarm time is 15 minutes before the event. For appointments 
taking place out of office, this has to be tuned. 

Getting events as calendar notes to the mobile phone is useful, as well as being 
able to enter events directly from the phone’s calendar to the office solution’s calen- 
dar. 

Directory. The corporate directory users mainly use the application to get and store 
contact information on their mobile phones. Providing content as vCards, which en- 
ables saving directly on the mobile phone, was appreciated. Text format is also impor- 
tant however, since not all details can be included in a vCard. Since the directory 
application is fast and easy to use, some users even use it to get information about 
people who are in the same meeting with them. 

Voice applications. For voice applications, providing the user interface in the user’s 
native language can greatly improve the user experience even if the user has relatively 
good skills in a certain foreign language. In voice interaction, the guiding prompts 
need to get shorter as the user gets more experienced, and it must be possible to inter- 
rupt the speech synthesizer. The possibility to set the synthesizer’s speed is important, 
along with the possibility to navigate forward and backward within a message. 

One-FIanded Navigation. WAP applications become almost naturally designed for 
easy one-handed navigation, as WAP devices are typically used with one hand only. 
Most WAP-enabled devices have no stylus, and thus the cursor stops on every link. 
This means that it is best to present the content first on the page, and the navigation 
tools only after it. This makes accessing content fast on devices that rely on moving 
from one link to the next in the order provided by the application, as opposed to pre- 
senting navigation links at the top or side of every page, as then the users would have 
to navigate through these links on every page. Presenting navigation tools first works 
well with large screen interfaces, though, since there the tools can always be visible. 

Enabling easy one-handed navigation is a good design driver for all small inter- 
faces, as it forces the interfaces to be simple and fast to use, and to provide the most 
important content first without any unnecessary scrolling. Navigation bars are useful 
on large screens, but very painful to scroll through at the top of every page on a small 
screen device - this is because one almost never wants to use the navigation links 
before having seen the actual content on the page, and thus they only slow the use 
down significantly. However, interfaces that enable easy use with one hand are easy 
to use also with two hands, e.g. with a touch screen and a stylus. 
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4 Discussion 

A literature review on suggested guidelines for mobile devices and applications was 
presented, followed by experiences from the development work of a mobile connec- 
tivity service. 

Designing for people who are on the move is a good design principle, as people use 
a mobile service during even very short breaks if it is easy and fast [15,18,19], Simi- 
larly as Weiss [19] notes about mobile commerce on the wireless web becoming suc- 
cessful only after it is more convenient than making a phone call, it is to be noted that 
people only use mobile e-mail if it as a whole - with its response times, access speed 
etc. - is more convenient than waiting to get to use the PC for instance in the office. 

The mobile service described in this paper presents specified sets of contextual in- 
formation, e.g. only new e-mail messages instead of all messages in inbox, and pro- 
vides search and sort possibilities on various levels, which has been observed to en- 
able fast use. Approaches closely related to this kind of implementation exist in 
literature: for large information structures, it has been suggested to first give an over- 
view, then to enable narrowing the scope, and to give the details only when the user 
requests them [16], and for Internet use on small screen devices, pre-processed sum- 
marization views that provide context information and enable view specific searching 
have been shown to be useful [3, 4, 5, 6, 8]. We have found that in addition to visual 
interfaces, this kind of approaches are useful also in voice services, like voice e-mail 
and calendar. 

It was noticed that for a multi-channel service, it is crucial to enable easy switching 
between access methods, and to provide similar content across different access meth- 
ods, thus enabling users to use what they have available at a time. This is in line with 
recommendations for e-commerce services [19]: if users cannot use what they have at 
hand or will lose the thread of what they are doing, they may very well not perform 
the action at all or move to using another channel or service. Good ways to enable use 
over different access methods include supporting the same simple, such as numeric- 
only, passwords over different mobile platforms, and making the authentication easy 
and fast. Providing various ways to use a service makes the service useful and moti- 
vating for a broad audience. Providing similar content across different media is chal- 
lenging, though, as for example, SMS, WAP and Voice as access methods provide 
very different interaction design possibilities, each with their own particular limita- 
tions. 

5 Conclusions 

Providing similar content across different access methods is crucial for mobile com- 
munication applications. Designing to enable easy one-handed navigation is a good 
way to keep the applications simple, and thus scalable for different screen sizes and 
devices. These issues are important for multi-channel delivery on future handheld 
devices, as soon it will be possible to use the same content almost as such for various 
devices, and the device-specific modifications, when necessary, can be made for ex- 
ample with different style sheets. For instance, XHTML MP, the language of the 
future version 2.0 of WAP, can be viewed also with large screen browsers, and thus 
“upgrading from the small screen applications”, i.e. taking the small screen applica- 
tions as the starting point for the larger interfaces, will be a feasible strategy. 
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Tailoring only the most important views of the application to take full advantage of 
the specific device type’s (e.g. mobile phone, Pocket PC, etc.) capabilities, while 
leaving the other views as simple as possible, enables high usability on various de- 
vices, without the need to make too many different designs. Enabling easy one- 
handed navigation is obviously an efficient design principle also e.g. when designing 
for the emerging phone clients that run on the Java or Symbian platforms, or when 
porting existing mobile applications to these new environments. 
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Abstract. This paper tries to address the question of how to provide added value 
to mobile people through mobile applications. Our suggestion for the next gener- 
ation of value added mobile applications following the support for (i) communi- 
cation and (ii) information access, is (iii) to provide mobile people with services 
that are very specific for the context area - an area representing a specific context 
- these people are currently in and (iv) to support the networking of individu- 
als to form communities. We envision communities being built of humans, who 
come together in physical proximity, reside in equal or similar situations (e.g. 
people waiting on train- or tram stations), or do have the same interests. Spoken 
more general these communities will be built of humans, who come together in 
the same context area, where a context area can be constrained physically or logi- 
cally. Therefore, this work introduces the notion of wireless context area networks 
(WCANs) as enabler of a ubiquitous information access. 



1 Introduction 

The current most ubiquitous mobile application 1 is mobile telephony. Looking back to 
the past, the objective of this application was to satisfy the communication needs of 
humans. People wanted to stay-in-touch with their families, relatives, colleagues etc. 
at anytime and from anywhere. This is still the objective at the present, but we nowa- 
days can observe another goal of mobile applications too: people increasingly want a 
seamless access to required information like personal data or context (e.g. location) 
dependent information. Like mobile telephony most of the current developed mobile 
applications addressing this need, follow a one-to-one communication paradigm. 

However, there still exists the question for mobile operators how to provide added 
value to their customers through mobile applications. This is especially true for Eu- 
rope. A number of consortia address this problem and aim at developing scenarios of 
innovative mobile applications or to lay the foundations for building them [1-3]. Our 
suggestion for the next generation of value added mobile applications following the sup- 
port for (i) communication and (ii) information access, is (iii) to provide mobile people 

1 By a mobile application we understand an application, where at least a part of the application 
executes on a mobile device and this device in turn communicates at least with one another 
stationary or mobile device via a wireless connection. 

F. Crestani et at. (Eds.): Mobile and Ubiquitous Info. Access Ws 2003. LNCS 2954, pp. 42-53, 2004. 
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with services that are very specific for the context area these people are currently in and 
(iv) to support the networking of individuals to form communities. 

We envision communities being built of humans, who come together in physical 
proximity, reside in equal or similar situations (e.g. people waiting on train- or tram 
stations), or do have the same interests. Spoken more general these communities will 
be built of humans, who come together in the same context area. Context areas are con- 
strained by physical or logical boarders. Examples of context areas defined by physical 
boundaries are sport stadiums, railway stations, airports, or a university campus. Con- 
text areas that are constrained logically can be defined through activities or tasks people 
are engaged in, like waiting on a tram station, driving on the highway, waiting in front 
of a concert hall and the like. As opposed to the traditional one-to-one communication 
paradigm, the communication paradigm deployed here reaches from one-to-many to 
many-to-many, like it is known from chat-rooms and groupware systems [4], 

Based on this motivation the problem to be addressed by this work is to create 
value-added mobile and ubiquitous applications through providing mobile people with 
services that are very specific for the context area these people are currently in, and, by 
supporting the networking of individuals to form communities. Further on we will refer 
to those context area specific services as contextual services and to those communities 
being built of humans, who come together in the same context area, as context based 
communities (CBCs). 

The rest of this document is structured as follows: Section 2 describes the vision 
of creating value-added contextual services for mobile people. Section 3 focuses on 
describing the idea of WCANs as enabler of CBC applications. Preliminary analysis 
results of the requirements of mobile people and the so far developed scenarios of con- 
textual services and CBC applications are summarized within Section 4. A survey on 
related work is done in Section 5. The document closes with concluding remarks and 
an outlook to further work. 



2 Wireless Context Area Networks (WCANs) 

In this section we want to describe our vision for creating value-added contextual ser- 
vices and CBC applications. 

In order to address the problem of creating added value through mobile and ubiq- 
uitous applications this work introduces the notion of wireless context area networks 
(WCANs). Wireless context area networks are motivated by two facts: 

1 . Contextual information as a vital ingredient for a successful mobile and ubiquitous 
application 

2. The boundary principle [5] 



Contextual Information as Vital Ingredient 

If we compare the execution context of a mobile application and the execution context 
of an application running on a fixed desktop computer we can observe great dynamics of 
mobile applications (e.g. context of movement linked with a changing location, ambient 
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conditions, available interfaces, bandwidth, user tasks and habits, personal interests, 
temporal and spatial situations etc.). 

As a forerunner of future mobile applications one can observe three typical ques- 
tions when calling someone on his mobile phone. These questions are: 

1 . “Where are you just now?” (time known, context unknown) 

2. “What are you doing just now?” (time known, context unknown) 

3. “Do I disturb you?” (time known, context unknown) 

Looking at these questions we can see that elementary questions are not answered 
even in the most ubiquitous mobile application, namely mobile telephony. Thereof, 
we derive a great potential for future mobile and ubiquitous applications that utilize 
contextual information. Dey defines this usage of contextual information as context 
awareness ([6], p. 6): 

“A system is context-aware if it uses context to provide relevant information 

and/or services to the user, where relevancy depends on the user’s task.” 

More than sitting in front of a desktop computer and interacting with an application 
or even more applications, where the execution context is mainly static, we believe that 
the usage of contextual information for providing relevant information and/or services 
to the user will be vital for a success of upcoming mobile and ubiquitous applications, 
where the execution context is fairly dynamic. Therefore, mobile applications have to 
be aware of their execution context [7], more than traditional applications. 



The Boundary Principle 

The second fact that lays the basis for wireless context area networks is the boundary 
principle of Kindberg and Fox [5], which says the following: 

“Ubicomp system designers should divide the ubicomp world into environ- 
ments with boundaries that demarcate their content. A clear system boundary 
criterion - often, but not necessarily, related to a boundary in the physical world 
- should exist. A boundary should specify an environments scope but doesnt 
necessarily constrain interoperation.” 

In conjunction with Kindberg and Fox a WCAN has to be understood as an au- 
tonomous network (cf. 2.1) that is characterized by a specific context. The locality 
of this network should not constrain interoperability beyond the boundaries of this 
network. Therefore, the scope of a WCAN is not necessarily constrained to physical 
boarders, instead a context area can also be defined by logical boundaries. Examples 
of context areas defined by physical boundaries are sport stadiums, railway stations, 
airports, or a university campus. Context areas that are constrained logically can be de- 
fined through activities or tasks people are engaged in, like waiting on the tram station, 
driving on the highway, waiting in front of the concert hall and the like. The idea of 
WCANs is - according to the boundary principle - to subdivide the environment into 
areas that represent a specific context. “The real world consists of ubiquitous systems, 
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Fig. 1 . Relevant Context Areas During a Workday 



rather than ’the ubiquitous system”’[5]. A possible set of context areas a human might 
meet during a workday is illustrated in Figure 1 . 

Taking into consideration that the user is currently present in one of these context ar- 
eas relevant information and/or services can proactively be provided to him, preferably 
depending on his preferences, habits and tasks (cf. 2: context awareness). This leads to 
the notion of WCANs as service environments, which is described in the next section. 

2.1 WCANs as Service Environments 

The idea of sub-dividing the environment into areas of specific context comes along 
with the consideration to provide services, applications and information that are very 
specific for that area. Closely related in this concern is the AROUND project described 
by Jose [8] and Jose et al. [9]. The work presented there is about supporting the associa- 
tion of services with location in such a way that mobile applications can select services 
relevant for their location. A service-based architecture that supports location-based 
service selection is presented. A central element of this architecture is a scope model 
that assumes for each service to have an associated scope that specifies the physical 
range in which it should be available. This is very similar to our notion of context areas. 
Nevertheless, the primary context information used within the AROUND project is lo- 
cation, while we also consider context areas that are constrained logically, e.g. through 
activities or tasks people are engaged in. 

A car driver A for instance can relay information about a traffic jam he recently 
passed by. Drivers of oncoming traffic would come up with this traffic jam. For those 
drivers the information driver A provides will be valuable in order to avoid coming up 
with the traffic jam. In this scenario information arises in the context area road traffic 
and is also consumed in this context area. This is what is meant by context areas as au- 
tonomous networks mentioned above. What happens here is a form of context sensitive 
ad hoc communication, as described by Yau et al. [10]. Further on, the participants of 
the road traffic in this scenario can be viewed as being parts of a context based commu- 
nity. 
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Another example is an indoor tennis court, where visitors are interested in all topics 
concerned with tennis. For instance information about persons, who played on this court 
before, results of previous games, which events and tournaments are planned for the 
future etc. 

Looking at these examples we can see that a service, which is of value in various 
context areas, is to relay information to other persons. This is just one service, a number 
of further services that may be of interest in various context areas can be considered. 
Nevertheless, there will also be services that are only of value in a very specific context. 
Therefore, single context areas can be understood as service environments. 

Two examples of service environments will be explained in the following: 

1 . Interactive conference 

2. Mobile passenger information and ticketing 

Interactive Conference 

The interactive conference service environment consists of services, which may be in- 
teresting for participants of a scientific conference. The conceptual model of a service 
space at the conference site can be imagined as shown in Figure 2. 




Fig. 2. Service Space at Conference Site 



At the time a conference participant enters the conference area the start page of the 
interactive conference service environment appears on the display of his mobile device 
(Figure 3). 

As every fixed (conference site infrastructure) and mobile device (participants mo- 
bile devices) can act both as service consumer and service provider, the available space 
of services at the conference site can grow, if participants also act as service provider. 
The result is a common shared service space. In the case of the interactive conference 
scenario the conference backbone infrastructure will primarily act as service provider. 
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Fig. 3. Interactive Conference: Available Services 



Mobile Passenger Information and Ticketing 

Another example of a specific service environment is the mobile passenger information 
and ticketing service space for public transport. Available services for passengers are 
shown in Figure 4. 

Considering the following situation the Query Route service might be interesting 
for passengers: 

Gerhard is sitting in the tram. Suddenly he bears in mind that he has promised his 
son to show him a photo of an airplane taking off when coming home. Thus, he decides 
not to drive home, but instead to drive to the airport to take the promised photo. The 
question for Gerhard now is, how to come to the airport. He uses his mobile companion 
to query the route to the airport as shown in Figure 5. Using the “Ticket Info ” service 
he can check if his current ticket is valid for the trip to the airport. If not, he can use the 
“Get Ticket” service to order the required ticket. 

2.2 Humans as Travellers between Context Areas 

It is natural that humans are travelling between the two above-mentioned context areas 
or the context areas as shown in Figure 1 . Furthermore, people are using the services 
provided within the single context areas. Through the usage of the mobile device - the 
user’s personal information appliance - in distinct context areas the boundaries between 
these areas are softened. Interoperation among various context areas is thus possible 
(cf. 2: boundary principle). 
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Fig. 4. Mobile Passenger Information and Ticketing: Available Services 



Additionally, the services on the mobile devices of humans, who are concurrently 
within the same context area, can interact, and thus enabling context sensitive ad hoc 
collaboration of those humans. This can be exchange of information, music sharing, or 
to get in contact with persons, who have similar interests etc. What happens then is a 
form of community building, which is described in the following section. 



3 WCANs Enabling Context Based Communities 



In order to address the current widespread problem of creating added-value through 
mobile and ubiquitous applications we suggest to support the networking of individuals 
to form communities. This objective of mobile and ubiquitous applications, together 
with the provision of contextual services, follows the goals of (i) supporting humans to 
stay-in-touch with their families, relatives, colleagues etc. and (ii) supporting a seamless 
access to required information like personal data or context dependent information. 

We see added-value of mobile and ubiquitous applications in supporting community 
building of persons, who reside in the same context area. Now, where does this aim 
come from? Therefore, let’s have a look at closely related work and into the history of 
social interaction among persons. 

In his theory of proxemics Edward T. Hall [11-13] established the idea that there 
are distinct levels of proximity in interpersonal communication: 
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Fig. 5. Query Route Service 



- Intimate space - the closest “bubble” of space surrounding a person. Entry into this 
space is acceptable only for the closest friends and intimates. 

- Personal space - is used for conversations and among friends and family members. 

- Social space - the space in which people feel comfortable conducting routine social 
interactions with acquaintances as well as strangers. 

- Public space - the area of space beyond which people will perceive interactions as 
impersonal and relatively anonymous. 

Kortuem [14] builds on Hall’s concept of social space. The augmentation of social 
interactions and social space is the key mechanism of his alternative model of wearable 
computing, called social wearable computing. More than augmenting humans sensory 
capabilities by wearable computers during face-to-face social interactions as done by 
Kortuem, we believe that there is a need to support interaction of people, who reside 
within the same context area (cf. 2). Therefore, we suggest to extend Hall’s spatial 
zones by the notion of contextual space, where from now on we use contextual space 
as a synonym for context area. We place contextual space between social space and 
public space, as interaction with people in the same context area is a form of social 
interaction, and, as this interaction is based on sharing the same interests for instance, it 
is not sensed that anonymous. In fact sharing the same interests or residing in the same 
situation as others lays the basis for humans feeling as part of a community - a context 
based community. 
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Examples for such communities can be people visiting a sports event like a For- 
mula 1 race; people residing in concert area and waiting for the musicians starting their 
performance; people waiting on (different) train- and tramstations; people driving on 
a highway etc. A famous example for such a community application is the Hocman 
prototype [15], which supports social interactions among motorcyclists. 

We see wireless context area networks as enabler for building applications support- 
ing the interaction of humans residing in the same context area. However, one inter- 
esting challenge will be to determine the “proximity” - context proximity - between 
people, who meet in a logical context area. 

4 Common Patterns 

In this section we will briefly describe the requirements of mobile people as well as the 
characteristics that were identified when analyzing scenarios of context areas and the 
provided contextual services. This is just a preliminary result as the development and 
analysis of scenarios is still ongoing work and the above-mentioned examples of context 
areas represent just a sub-range of the already developed scenarios. Requirements of 
mobile people: 

- Information exchange/dissemination with/to other parties (people or services) 

- Information capturing 

- Accessing context dependent, timely information 

- Time independent usage and provision of information and services 

- Supporting community building (i.e, to get acquainted with persons of same inter- 
ests) 

- Usage of services to shorten waiting- or idle times (e.g. games) 

- Guide-me services 

- Mobile access to discussion forums (but not that high priority) 

Among the issues that can primarily be seen as common characteristics of contex- 
tual services are: 

- Context discovery / information discovery / service discovery 

- Service deployment (e.g. games, individual services) 

- Frequently (and often unpredictable) context changes 

These lists will be extended as analysis of scenarios continues. 



5 Related Work 

As presented in the paper the work of Jose [8] and Jose et al. [9] is closely related in 
terms of providing services that are relevant within a specific area. The work of Kortuem 
[14] is closely related to the proposed work in terms of supporting interactions among 
mobile people. Nevertheless, the concept proposed within this work aims at supporting 
interaction of people residing in the same context area, rather than augmenting face-to- 
face interactions as done by Kortuem. Other related (conceptual) work will be Gaia [16]. 
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Platforms and frameworks that facilitate the application development for mobile 
environments are important for this work. Some existing ones that provide support for 
issues that are inherent for ubiquitous systems are the following: the Context Toolkit 
by Dey and Abowd [17] for instance focuses on the development of context aware 
applications. Proem by Kortuem et al. [ 1 8] and XMIDDLE by Mascolo et al. [ 1 9] pro- 
vide computing platforms for mobile ad hoc applications, whereby the latter especially 
focuses on synchronization mechanisms of data replicated on several mobile devices. 
LIME by Murphy et al. [20] is targeted towards physical mobility of users and their 
mobile devices and logical mobility of code. 

Dividing the environment into context areas (or spaces) to provide or to use in- 
formation and services that are specific for that area can also lead to the notion to 
use space-based technologies for that reason. Therefore, the suitability of space-based 
technologies as platform for mobile context sensitive services has to be evaluated. The 
above-mentioned LIME is one of those space-based technologies. Further examples 
for that technology are CORSO [21], JavaSpaces [22], TSpaces [23], Limbo [24], and 
GigaSpaces [25]. 

Ongoing work in the service discovery domain incorporating the characteristics of 
mobile environments is also of interest for the described work. Konark [26] is a ser- 
vice discovery and delivery protocol designed specifically for ad hoc, peer to peer net- 
works, and targeted towards device independent services in general and m-Commerce 
oriented software services in particular. Handorean and Roman [27] are describing a 
service model built on top of LIME for service provision in ad hoc networks. JDSP 
(JESA Service Discovery Protocol) [28] also aims at efficient service discovery in ad 
hoc networks. And Lee and Helal [29] describe the use of context attributes for dynamic 
service discovery. Questions that have to be answered in order to facilitate service loca- 
tion, provision, and access in mobile and ubiquitous environments include: How can a 
mobile device detect a remote service in mobile and ubiquitous environments? How can 
a mobile device access a remote service in mobile and ubiquitous environments? How 
can a device advertise its desire to provide services to the rest of the members residing 
in the same context area? Project JXTA [30] can be an answer to these questions. 



6 Conclusion and Further Work 

In this paper we have presented our vision of creating value-added mobile and ubiqui- 
tous applications through the provision of contextual services and the support of net- 
working of individuals to form communities. Based on the concept of wireless context 
area networks areas representing a specific context act as service environments. These 
context areas or service environments can be constrained physically or logically. Hu- 
mans, as travellers between context areas, use their personal information appliances to 
access services in the respective context area. Application scenarios of contextual ser- 
vices and CBC applications as well as some of the requirements of mobile people and 
for software supporting the development of contextual services and CBC applications 
were presented. 

As future work we will continue developing scenarios of contextual services and 
CBC applications in various context areas and analyzing them in order to identify com- 
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mon aspects and patterns that act as requirements for software support to realize the 
scenarios. Based on these requirements we will perform detailed investigations of plat- 
forms and frameworks that seem to be promising for the realization of the scenarios. 
The goal is to create a framework for the development of WCANs as enabler of con- 
textual services and CBC applications. Thereby, we will base on identified promising 
approaches. Through prototypical implementations of some of the developed scenarios 
the developed concept and WCAN framework shall be evaluated. 
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Abstract. This paper addresses the issue of finding and accessing online educa- 
tional resources from mobile wireless devices. Accomplishing this task with a 
regular Web search-and-browse interface demands good interface skills, a large 
screen, and fast Internet connection. Searching for the proper interface to access 
multiple resources from a mobile computer we have selected an approach based 
on self-organized hypertext maps. This paper presents our approach and its im- 
plementation in the Knowledge Sea system. It also discusses related research 
efforts and reports the evaluation of our approach in the context of a real class- 
room. 



1 Introduction 

The modem Web is the largest treasury of educational resource has ever been avail- 
able. It’s customary nowadays for college professors to recommend a set of useful 
Web resources for any lecture and to encourage the students to find more relevant 
resources themselves. It is currently anticipated that the students access these re- 
sources from computers at home or at the university labs. This model contradicts with 
the popular "anytime, anywhere" slogan of Web-enhanced education. While the Web 
is always "present" the students can’t yet access it from anywhere. It is certainly a 
restriction to an educational flexibility - like a requirement to read a textbook always 
at home or in class, but not outside, in a cafe, or while riding a bus. The use of mobile 
wireless handheld devices potentially allows the students to access educational re- 
sources really "anywhere", however, a number of steps have to be preformed to make 
it really happen. The problem here is not simply technical. Supplying all students with 
wireless handheld computers and providing a wireless connection in some large area 
is an important step towards the solution, but is not the solution on itself. The problem 
is that almost all expository and objective Web-based educational resources have been 
designed for relatively large screens and relatively high bandwidth. Special research 
efforts have to be invested to develop educational resources that are suitable for use 
with handheld devices or to adapt existing resources for the new platform. 

The goal of our group at the Department of Information Science and Telecommu- 
nication at the University of Pittsburgh is to explore different ways in which mobile 
wireless devices can be used for college education. Having both information science 
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and telecommunication faculty under the same roof, a school-wide wireless network, 
and dozens of wireless handheld devices, we have very nice settings for developing 
new systems and exploring them in the classroom. The focus of one of our research 
project is the access to multiple educational Web resources from mobile devices. As 
we have mentioned above, a variety of Web resources is available for any course. The 
resources often overlap and complement each other, so multiple resources have to be 
used for studying almost any topic. For example, in our "Programming and Data 
Structures" course based on C language, we recommend the students to use several 
free C language tutorials and other on-line resources (such as C language FAQ). Dif- 
ferent tutorials cover different topics with different details and also do it using differ- 
ent styles. Altogether, they well complement the course textbook and enable students 
with different levels of knowledge or different learning styles to get a better compre- 
hension of the subject. 

Unfortunately, it is hard to expect a teacher to provide a list of relevant readings for 
a lecture from more than one source (that is usually a textbook). What a teacher usu- 
ally can do is provide the links to the home pages of all these tutorials hoping that the 
students will be able to locate tutorial fragments that are relevant for each lecture. 
Unfortunately, as we have found in the course of our research, the students almost 
never do it. Even on a desktop computer finding relevant reading fragments buried 
deeply under the tutorial home pages and distributed over several tutorials is a chal- 
lenging activity that requires good navigation skills, a large screen, and a fast Internet 
connection (Figure 1). Mobile computers with small screens and slower connection 
need another interface to accomplish the same task. 




Fig. 1 . Studying from multiple on-line resources 



Searching for the proper interface to access multiple resources on a mobile com- 
puter we have considered several options and finally selected an approach based on 
self-organized hypertext maps. This paper presents our approach and its implementa- 
tion, discusses related works, and reports the results of using our approach in the con- 
text of a real classroom. 
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2 Navigating Multiple Educational Resources 
with a Self-organized Map 

The core of our approach to navigating educational resources is a self-organized 
hyperspace map. Hyperspace maps are generally regarded as one of the most impor- 
tant tools in hypertext navigation. A map can provide concise navigation and orienta- 
tion support for a relatively large hyperspace. Traditionally hypertext maps are de- 
signed manually by hypertext authors. This manual approach is totally inappropriate 
for a heterogeneous distributed Web hyperspace that has no single author. However, 
there are a number of known approaches to automated or automatic building of hyper- 
text maps. The approach that we have chosen is based on the Self-Organizing Map 
(SOM), an artificial neural network that builds a two dimensional representation of 
the inputs. SOM is a very attractive technology for developing compact maps for a 
large hyperspace since it builds a map representing only the neighborhood relation- 
ship between the objects. In these maps only the relative distance between objects is 
reported and any other information is lost. 




Fig. 2. A session of work with the Knowledge Sea system 

A two-dimensional map of educational resources developed with SOM technology 
is the core of our Knowledge Sea system for map-based access to multiple educa- 
tional resources (Figure 2). Knowledge Sea was designed to support a typical univer- 
sity class on C programming. In this context, the goal of the students is to find the 
most helpful material as a part of readings assigned for every lecture in the course. 
The most easily available Web educational resources are multiple hypertextual C 
tutorials. In this context, the goal of the Knowledge Sea system is to help the user 
navigate from lectures to relevant tutorial pages and between them. 

The users see the Knowledge Sea map as an 8-by-8 table (Figure 2). Each cell of 
the map is used to group together a set of educational resources. The map is organized 
in such a way that resources (web pages) that are semantically related are close to 
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each other on the map. Resources located in the same cell are considered very similar, 
resources located in directly connected cells are reasonably similar, and so on. 

Each cell displays a set of keywords that helps the user locate the relevant section 
on the map. It also displays links to “critical” resources located in the cell. By critical 
resources we mean resources that are known to the user and that can serve as origin 
points for map-based navigation. For example, for lecture-to-tutorial navigation the 
critical resources are lectures and lecture slides known to the users (see two map cells 
in the enlarged section on the upper left part of Figure 2). The cell color indicates the 
"depth of the information sea" - the number of resource pages lying "under the sur- 
face" of the cell. Following the "information sea" metaphor we use several shades of 
blue in the same way they are used on geographic maps to indicate depth. For exam- 
ple, light blue indicates "shallow" cells with just a few resources underneath while 
deep blue indicates "deep cells" that have the largest number of resources. The re- 
sources "under" the cell can be observed by "diving". A click on the red dot opens the 
cell content window (right on Figure 2) that provides a list of links to all tutorial 
pages assembled in the cell. A click on any of these links will open a resource- 
browsing window with the selected relevant page from one of the tutorials. This page 
is loaded "as is" from its original URL. A user can read this page and use it as a start- 
ing point to navigate an area of interest in the tutorial. 

The map serves as a mediator to help the user navigate from critical resources to 
related resources. These links to critical resources work as landmarks on the map, 
and, together with the keywords, give an idea of the material organized by the map. If 
the user is interested in finding some additional information on the topic of lecture 14 
(devoted to pointers), the first place to look is the cell where the material of this lec- 
ture is located (shown as L14 link on the enlarged section of Figure 2). If the user is 
looking for the material that can enhance the topic of the lecture in some particular 
direction, the cells that are close to the original cell provide several possible directions 
to deviate. For example the material related to memory usage in the context of point- 
ers is located underneath of the cell with L14 mark. The links to other critical re- 
sources shown on the map can help selecting the right direction for deviation. For 
example, a good place to look for a material that can connect the content of lectures 
14 and 15 is a cell between cells where L14 and L15 links are shown. The map helps 
the user to select the page related to the original in the “right” sense. 



3 The Mechanism of the Self-organizing Map 

The Knowledge Sea map is automatically built by an artificial neural network. Artifi- 
cial neural networks are formed by a set of interconnected simple processing units 
that can “learn” to process the input data by using a supervised learning algorithm or 
using self-organization. The neural network used to build the document map is the 
Self-Organizing Map (SOM, sometimes referred as Kohonen map) [4], In this neural 
network the units are organized in a sort of elastic lattice, usually two-dimensional, 
placed in the input space (in our case the hyperspace spanned by the set of docu- 
ments). During the learning phase this lattice “moves” towards the input points. This 
“movement” becomes slower and at the end of the learning stage the network is “fro- 
zen” in the input space. 
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After the learning stage the units of the map can be labeled using the input vectors 
and the map can be visualized as a two-dimensional surface with the inputs vectors 
distributed on it. Input vectors that are near each other in the input space are near each 
other on the map (Figure 3). 




Fig. 3. The organzation of different input and the structure of the map 



3.1 SOM Algorithm 

The SOM algorithm is explained below referring to a N, x N, rectangular grid (the 
extension to a hexagonal grid that does not favor horizontal and vertical directions is 
straightforward) . 

Each unit ( = X Af, } has a weight vector: 

w,(f)e91" (1) 

where i defines the position of the unit inside the array. The SOM model also contains 
the h(c,i,t) function that defines the "stiffness" of the elastic surface to be fitted to the 
data points. This function depends on the relative position of the two units c and i on 
the network grid and contains some parameters that are updated during the learning 
stage. 

Suppose we have a set of m training vectors X = {x k , k=l,2,... m), with X ;| G . 
During the learning stage these vectors are presented to the network. After a sufficient 
number of learning steps the weight of each neural unit will specify a codebook vec- 
tor for the input distribution, these codebook vectors will sample the input space. 

The unit weights (codebook vectors) will be organized such that topologically 
close units of the grid are sensitive to inputs that are similar. The learning algorithm is 
below: 

1. Initialize the unit weights vv P the discrete time 1=0, and the parameters of the 
function h(c,i,t)\ 

2. Present the input vector X6 X ; 

3. Select the best matching unit c (b.m.u.) as: 
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l|x-w,.||= min jjx-wJl] 

11 11 i=l,2,..NixN 2 ^ 11 

4. Update the network weights 

w,(V+l)= w, (f )+ h ( c , i, t )[x - w,. {t )] 

i = 1,2,...,N 1 xN 2 

5. Update the parameters of the function h(c,i,t) 

6. Increment the discrete time t 

7. If t =< t mal[ then go to step 2. 

The learning function is indicated in step 4. In this step the b.m.u and the nodes 
that are close to the b.m.u in the array will activate and update their weight vectors 
moving towards the input vector (Figure 4). 




Fig. 4. A representation of the SOM learning algorithm. The gray area is the neighborhood of 
the best matching unit 

The amount of movement is modulated by the h(c,i,t), the so-called neighborhoods 
function, a smoothing kernel defined over the lattice points. For the convergence of 
the algorithm it is necessary that: 

\\mh{c,i,t) = 0 (2) 

/-»oo 

The h(c,i,t) takes the max value on the b.m.u and decays on the units that are dis- 
tant from it. In the literature two functions are often used for the h(c,i,t): the simpler 
one refers to a square neighborhood set of array point around the b.m.u. as shown on 
Figure 5. If their indexes set is denoted NJt) then the function is defined as: 

I a(t) ij ieN c (3) 

h(c,i,t) = i 

I 0 otherwise 



Where: 

• N c (t) is a function of time and is shrinking during the time 

• a(t) is defined as learning rate and is monotonically decreasing during the time. 

The other widely applied smoothing neighborhood kernel is written in terms of the 
Gaussian function. 
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Fig. 5. Nc(t) gives the set of nodes that are considered the neighborhood of the node c. t 3 <t 2 <t 3 

3.2 Parameter Values 



If the SOM network is not very large (a few hundred nodes at most) the selection of 
parameter values is not very crucial. As a "rule of thumb", it is possible to start with a 
fairly wide N c (0), even more than half the diameter of the network, and letting it 
shrink with time. An accurate function of time is not very important for the learning 
rate aft) - it can be linear, exponential or inversely proportional to t. The accuracy of 
the learning depends on the number of steps in the learning phase: it should be at least 
500 times the number of the network units. There is no theoretical way to determine 
the amplitude of the parameters that have been chose by tentative. By empirical ob- 
servation the learning stage is divided into two phases of very different length: 

• ordering phase: in this phase the network organizes the weights of the units in 
order to roughly approximate the input distribution. The parameters should have 
the following initial values: ao near to the unit (e.g. 0.8) and the smoothing kernel 
should be large enough to take almost the whole network when the weights are 
changed. 

• convergence phase: the convergence phase is the refining phase in which the 
vectors reach their final positions. It is 8 or 9 times longer than the ordering phase 
and during this phase there are no large variations of the unit weights. The pa- 
rameter ao should be small (0.2 or less) and constant or slightly decreasing. The 
smoothing kernel initial value should be narrow enough to change just a few 
units or only the b.m.u. 

A rough way to evaluate the quality of the result obtained after the learning stage is to 
calculate for each input vector x k G X the b.m.u. c and to evaluate the quantity A 
defined as: 



A = 



1 m 

-Y 

m k=i 




(4) 



It is convenient to calculate several maps with different initial values and to choose 
the best result. 



4 The Implementation of the System 

The neural network is just one part of the developed system. In order to prepare the 
learning set of the SOM map the HTML documents were preprocessed in order to 
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remove "noise" (copyright notes, author name, HTML tags, C code, and so on) and 
encoded using TF*IDF approach. With TF*IDF, each document is represented by a 
vector where each component corresponds to a different word. The value of the com- 
ponent is proportional to the occurrence of the word in the document and inversely 
proportional to its occurrence in the whole set of documents [8]. The calculation of 
the TF*IDF often includes a normalization factor to obtain a representation vector 
that is independent from the text length. 

The document set used for the learning phase of the SOM network included 210 
HTML files from three Web-based tutorials on C programming language. The whole 
set of pages contained 4249 different words. They were represented by the 500 most 
common words after the removal of stopwords. All document representations were 
collected in a file and submitted to the neural network simulator. At the end of the 
learning phase each cell of the map collected conceptually similar pages from various 
tutorials. 

The output of the neural network simulator was used to build a set of HTML pages 
that the user accesses interacting with the system. All pages were designed to fit the 
screen of a handheld PC such as the HP Jornada. The home page of the system con- 
tains only the map visualized as an HTML table. Each cell of the table corresponds to 
a neural unit of the map and is labeled by representative keywords. 

The system is also scalable: it is possible to add new resources to the system sim- 
ply by building the TF*IDF representation and submitting the vectors to the Self- 
Organizing Map. The neural network will classify the new vectors into the right cells. 



5 A Challenge of a Narrow Screen 

In order to choose which map geometry will fit small computer devices, several dif- 
ferent maps were trained using different approaches. Since our first mobile platform 
was the HP Jornada with a relatively wide screen, we have started with a popular 8x8 
SOM map. This geometry and this size provided enough space to organize all docu- 
ments. The learning stage in this case was not complicated and the standard value of 
parameters sufficient. The obtained 8x8 map was successfully used by our students 
for several month and it is this map that was used in a study presented below. 



Table 1. Parameters value for the Self-Organizing Map Training 





Ordering phase 


Convergence phase 1 


Convergence phase 2 


t 

max 


10000 


30000 


50000 


CCo 


0.2-0. 1 


0.05-0.02 


0.01-0.005 


N c (0) 


3 


2 


1 



Later, when a wireless card become available for the Palm (Handspring) platform, 
we have started to experiment with Palm-based devices. The standard Palm screen is 
relatively narrow (160 pixels). With our current Web interface it can fit only 3-4 map 
cells in a row. To adapt the map approach to Palm-size screen, we have explored a 
non-traditional 4x15 geometry. The goal was to obtain visualization scrollable only in 
vertical dimension in order make it easy to navigate the map. For this geometry the 
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learning stage was more complicated. First, we had to use the hexagonal geometry for 
the map to have the cells more tighten. Second, it was necessary to split the learning 
phase in three sessions and to use non-standard values of the parameters. The parame- 
ter values are provided in the Table 1. A representation of the geometry of the two 
maps is shown in Figure 6. 
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Fig. 6. The geometry of the 4x15 map (left) and the 8x8 map (right) after the learning phase 



Despite the efforts we have put into developing 4x15 maps, we were not satisfied 
with the results. The resulting map did not look very natural (its geometry on Fig, 6 
shows it clearly) and contained too many cells with no information. We concluded 
that this map could be more confusing than helpful for the students and ceased out- 
work with narrow screens. Fortunately, the introduction of newest wireless Palm 
devices with 320x320 screens allows us to continue our work with Palm-based de- 
vices. 



6 Similar Work 

There are a number of known attempts to use SOM for developing various "informa- 
tion maps" - two-dimensional graphical representations in which all the documents in 
a document set are depicted. The documents on a SOM are grouped in clusters. Clus- 
ters that group documents on similar topics are near each other on the map. The effec- 
tiveness of the SOM as a tool to cluster information and to develop information maps 
was discussed in many research works. Some studies indicate that the clustering re- 
sults obtained with SOM maps have meaning for the users. In particular, the prox- 
imity hypothesis (related topics are clustered closely on the map) was validated in [6]. 

In the WEBSOM system a SOM document map was used as a Web interface to 
classify Usenet newsgroup articles. The paper [3] reports the application of SOM 
network to organize 4600 documents. The documents were messages from the 
"comp. ai. neural-nets" newsgroup. In [5], a document map capable of organizing 
131500 newsgroup messages was built using a parallel SIMD computer. 
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The computational complexity of a SOM neural network is particularly empha- 
sized using TF*IDF representation because of the high dimensionality of the resulting 
vector space. The paper [7] argued that it is difficult to generate a map for large 
document collections (i.e. Gigabytes of data). This paper proposed a method for im- 
proving the speed of learning by exploiting the fact that the representing vectors are 
sparse vectors with many zeros. 

Our approach combines the ideas of "information mapping" using SOM with the 
ideas of dynamic navigation in an open corpus hyperspace. Our goal is not simply to 
"map" the information, but to help the user navigate from a set of critical items (for 
example, lectures) to similar items. The use of a map distinguishes our approach from 
traditional "intelligent" hypertext that explores automatic and dynamic linking. Tradi- 
tional automatic and dynamic linking ignores the user’s intelligence in finding rele- 
vant hyperspace paths substituting it by "machine intelligence" that can offer ready to 
be used one-click links to relevant items. Our map-based approach relies on both 
"machine intelligence" in organizing a hyperspace map and the user’s own intelli- 
gence in selecting a proper link on the map. It is similar to providing a city visitor 
with a map developed by an intelligent professional guide. 



7 The Evaluation 

The functionality and the usefulness of our map-based information access approach 
was evaluated it in the context of two programming-related courses at the University 
of Pittsburgh. Unfortunately, due to the insufficient number of Jornada organizers we 
were not able to run a large-scale evaluation of our approach on mobile devices. In- 
stead, we have performed a formative questionnaire-based evaluation of 8x8 Knowl- 
edge Sea map used on a desktop computer. We have made the system available to the 
students of our courses, logged the student interaction with the system, and adminis- 
tered a non-mandatory questionnaire at the end of each course. The analysis of the 
student answers to some of the questions was partially reported in [2], It has demon- 
strated that students regarded Knowledge Sea as a powerful tool for accessing exter- 
nal educational resources. Most impressed the students were with the system ability to 
place similar resource pages close to each other. 

Only one question in the evaluation questionnaire was directly related to the issues 
of mobile access. The students were asked in which context they would expect to use 
the Knowledge Sea system from a Jornada-like device if it could be accessible from 
anywhere. The format of the question was "multiple selection"; the students were able 
to check any subset of the four offered options that ranged from "in the classroom" to 
"anywhere". Figure 7 summarizes the answers of 72 students who used the system in 
the context of an introductory programming during one of the three consecutive se- 
mesters (Spring 2002 to Spring 2003). It was a surprise for us to see that the locations 
selected most often (by about 60% of students) were home and library. Less than 40% 
of the respondents considered using the system in class and less than 35% "from any- 
where". It shows that students are not quite ready for "anytime, anywhere" access. 
They consider a mobile device more as a different kind of computer and tend to use it 
in the context where they traditionally use computers (home, lab. and library). 

Fortunately, the student attitude to the use of mobile technology in education is 
changing as rapidly as the mobile devices are becoming common in everyday life. 
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Figure 8 that splits the data presented on Figure 7 into three consecutive semesters 
shows that the percentage of students who are ready to access our system "from any- 
where" has grown steadily over the 1.5 years of our study. At the same time, the per- 
centage of student considering the use of mobile devices in a context where regular 
computers were more appropriate has declined. 
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Fig. 7. Percentage of students considering the use of Knowledge Sea in different contexts 
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Fig. 8. The change of the percentage of students considering the use of Knowledge Sea in 
different contexts over three consecutive semesters 

Another observation brought by our study is the difference between the attitude of 
male and female students to mobile technology. As a cohort, female students who 
have filled the questionnaire (17 out of 72) were slightly behind their male classmates 
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in being ready to use the Knowledge Sea system outside of traditional context. As 
shown by Figure 7, female students are more eager to use the system in the currently 
most traditional "desktop" context - at home when working on an assignment. They 
are less eager to use the technology in non-traditional places - like a lecture theatre or 
a bus. Another evidence is that females have checked generally fewer options among 
the offered four than male students. None of the female students selected all four 
options (checking all would mean that they are ready to access our system really from 
any context) while 10% of male students did so. Also, more than 47% of female stu- 
dents checked just one of the four contexts while only about 40% of male students did 
so. 

Summarizing the results we can conclude that many students are not "mentally 
ready" to use mobile devices for educational needs "anytime, anywhere" as the pro- 
ponents of the technology hope. Moreover, female students are slightly behind their 
male classmates in embracing the technology. At the same time, the prospects of 
educational use of mobile devices look quite bright since the students’ attitude to this 
technology changes rapidly in the desired direction. 



8 Lessons Learned and Future Works 

Overall, we can conclude that SOM-based access to multiple information resources is 
a very useful technology. The 8x8 map that we have explored has worked well for the 
students. This map is large enough to provide a reasonable split of diverse content, yet 
is small enough to fit a Jornada-like handheld. We are now investigating the same 
map and the same interface in the context of a larger hyperspace of educational mate- 
rial (6 and more external tutorials instead of 3). We are also developing an improved 
interface for the system and working on integrating the map-based information access 
approach with our earlier work on adaptive hypermedia [1] and adaptive Web-based 
systems to develop an adaptive version of Knowledge Sea. 
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Abstract. Ease of browsing and searching for information on mobile devices 
has been an area of increasing interest in the information retrieval (IR) research 
community. While some work has been done to enhance the usability of hand- 
writing recognition to input queries, the characteristics of speech as an input 
mechanism have not been extensively studied. It is intuitive to think that users 
would speak more words when issuing their queries due to the ease of speech 
when they are enabled to form queries via voice to an information retrieval sys- 
tem than forming queries in written form. Is this in fact the case in reality? This 
paper presents some new findings derived from an experimental study to test 
this intuition, and assesses the feasibility of the spoken queries for the search 
purposes. 



1 Introduction 

Today, the phone is the most widely adopted communications device anywhere in the 
world. Mobile phone subscriptions are increasing faster than Internet connection 
rates. A new market study indicates that nearly 700,000 people around the world are 
signing up every day for mobile phone subscriptions, even though mobile phone calls 
cost about three times as much as calls made with fixed or "wired" telephones. There 
were 23 million mobile phone subscriptions which surpassed the total population in 
Taiwan by the end of March in 2002. In UK, 70% of adults said they owned or used a 
mobile phone and almost 4 in 5 (78%) UK homes claimed to have at least one mobile 
according to a survey in May 2001. The development of wireless technology enables 
this huge mobile user community to take advantage of the large amount of informa- 
tion stored in digital repositories and access the information anywhere and anytime 
they want such as stock trading, e-commerce, travel reservations, order placements 
and tracking, and much more. Currently, the means of input user’s information needs 
available are very much limited in keypad capability by either keying in or using a 
stylus on the mobile phone screen. Text-entry rates for the multi-tap method on older 
mobile phones are commonly 7-15 wpm; with predictive-text facilities this rate 
roughly doubles [3]. Key-tapping would therefore allow the entry of a typical 10- 
word question in 20-40 seconds, with continuous visual attention. Hand-writing with 
a stylus can be doubled at comparable speeds [4], This would suffice to satisfy some 
information needs. However, such input style does not work well for those users in 
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many situations such as when users are moving around, using their hands or eyes for 
something else, or interacting with another person. In addition, the availability of 
screens and keyboards are not useful to those with visual impairment such as blind- 
ness or difficulty in seeing words in ordinary newsprint, not to mention those with 
limited literacy skills. In all those cases, given the ubiquity of mobile phone access, 
speech enabled interface has come to the lime light of today’s IR research community 
which lets users access information solely via voice. 

The transformation of user’s information needs into a search expression, or query 
is known as query formulation. It is widely regarded as one of the most challenging 
activities in information seeking [1], Research on query formulation with speech is 
denoted as spoken query processing (SQP), which is the use of spoken queries to 
retrieve textual or spoken documents. From 1997 (TREC-6) to 2000 (TREC-9), 
TREC (Text REtrieval Conference) evaluation workshop included a track on spoken 
document retrieval (SDR) to explore the impact of automatic speech recognition 
(ASR) errors on document retrieval. The conclusion draw from this three years of 
SDR track is that SDR is a “solved problem” [13]. SQP has very much been focusing 
on studying the level of degradation of retrieval performance due to errors in the 
query terms introduced by the automatic speech recognition system. The effect of the 
corrupted spoken query transcription has a heavy impact on the retrieval ranking [15]. 
Because IR engines try to find documents that contain words that match those in the 
query, therefore any errors in the query have the potential for derailing the retrieval of 
relevant documents. Two groups of researchers have investigated this problem by 
carrying out experimental studies. One group [5] considered two experiments on the 
effectiveness of SQP. In their first experiment, they recorded 35 TREC queries (topics 
101-135) with query length ranging from 50 to 60 words with word error rate at three 
different percentage levels: 25, 33 and 50. The second experiment adopted substan- 
tially shorter queries of three lengths: 2-4, 5-8, and 10-15 content words which 
showed that as the query got slightly longer, the drop in effectiveness of system per- 
formance became less. Further analysis of the long queries by another group showed 
that [6] the longer “long" queries are consistently more accurate than the shorter 
“long" queries. In general, these experiments concluded that the effectiveness of IR 
systems degrades faster in the presence of automatic speech recognition errors when 
the queries are recognized than when the documents are recognized. Further, once 
queries are less than 30 words, the degradation in effectiveness becomes even more 
noticeable [7], Therefore, it can be claimed that despite the current limitations of the 
accuracy of speech recognition software, it is feasible to use speech as a means of 
posing questions to an information retrieval system which will be able to maintain 
considerable effectiveness in performance. However, the query sets created in these 
experiments were dictated from existing queries in textual forms. Will people use 
same words, phrases or sentences when formulating their information needs via voice 
as typing onto a screen? If not, how different their queries in written form are from 
spoken form? Dictated speech is considerably different from spontaneous speech and 
easier to recognise [8]. It would be expected that spontaneous spoken queries to have 
higher levels of word error rate (WER) and different kinds of errors. Thus, the claim 
will not be valid until further empirical work to clarify the ways in which spontaneous 
queries differ in length and nature from dictated ones. 

In this paper we present the results of an experimental study on the differences be- 
tween written queries and their counterpart in spoken forms. The paper is structured 
as follows. Section 2 discusses the usefulness of speech as a means of query input. 
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Section 3 describes our experimental environment of the study: the test collection and 
the experimental procedure. The results of this study are reported in section 4. Con- 
clusion with some remarks on the potential significance of the study and the future 
directions are presented in section 5. 



2 The Question of Spoken Queries 

The advantages of speech as a medium are obvious. It is natural just as people com- 
municate as they normally do. It is rapid: commonly 150-250 wpm [9]. It requires no 
visual attention. It requires no use of hands. All mobile phones and many PDAs are 
equipped with microphones. 

However, ASR systems are imperfect, which means that there is bound to be rec- 
ognition mistakes at different levels depending on the quality of the ASR systems. 
Queries are generally much shorter than documents in the form of both text and 
speech. The shorter duration of spoken queries provides less context and redundancy, 
and ASR errors will have a greater impact on effectiveness of IR systems [7]. In con- 
trast with spoken documents which can be processed and indexed offline, spoken 
queries need to be processed online and “almost” in real time. This intensifies the 
already computational expensive recognition process and demands the time for 
speech process to be kept short as it has been observed that user satisfaction with an 
IR system is dependent also upon the time the user spends waiting for the system to 
process the query and display the results [18]. Furthermore, input with speech is not 
always perfect in all situations. Speech is public, potentially disruptive to people 
nearby and potentially compromising of confidentiality. Speech becomes less useful 
in noisy environment. The cognitive load imposed by speaking must not be ignored. 
Generally when formulating spoken queries, users are not simply transcribing infor- 
mation but are composing it. For such tasks, the real limiting factor may be how 
quickly one can generate and formulate ideas. In this sense, it is no different from an 
accomplished typist who may be able to copy information quickly, but is slowed 
considerably when having to compose original text. 

However, despite the unavoidable ASR errors, research shows that the classical IR 
techniques are quite robust to considerably high level of WER (about up to 40%), in 
particular for longer queries [12]. Voice is more expressive. It has more cues includ- 
ing voice inflection, pitch, and tone. Research shows that there exists a direct relation- 
ship between acoustic stress and information content identified by an IR index in 
spoken sentences since speakers stress the word that can help to convey their mes- 
sages as expected [16]. People also express themselves more naturally and less for- 
mally when speaking compared to writing and are generally more personal. It has 
long been proved that voice is a richer media than written text [10]. Thus, we would 
expect, as a result, that spoken queries would be longer in length than written queries. 
Furthermore, the translation of thoughts to speech is faster than the transition of 
thoughts to writing. To test these two hypotheses, we constructed an experiment as 
described in the following section. 
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3 Experimental Study 

Our view is that the best way to assess the differentiations in query formulation be- 
tween spoken form and written form is to conduct an experimental analysis with a 
group of potential users in a setting as close as possible to a real world application 
[14]. We used a within-subjects experimental design [19] and in total, 12 subjects 
participated. 



3.1 Subjects 

As retrieving information via voice is still relatively in its infancy, it would be diffi- 
cult to identify participants for our study. We therefore decided to recruit from an 
accessible group of potential participants who is not new to the subject of Information 
Retrieval. 7 of our participants were from the IR research group who have knowledge 
of Information Retrieval to some degree and 5 participants were research students 
who all have good experience of using search engines within the department of com- 
puter and information sciences, but few have prior experience with Vocal Information 
Retrieval. Our subjects participated the experiment voluntarily. It is worth to mention 
that all participants were native English speakers. 



3.2 Text Collection 

The topics we used for this experimental study was a subset of 10 topics extracted 
from TREC topic collection. Each topic consists of four parts: id, title, description and 
narrative. An example of such topic is shown in Table 1. 

Table 1 . An example of a TREC topic 



<id> 1 

<title> Topic: Coping with overcrowded prisons 
<desc> Description: 

The document will provide information on jail and prison overcrowding and how inmates 
are forced to cope with those conditions; or it will reveal plans to relieve the overcrowded 
condition. 

<narr> Narrative: 

A relevant document will describe scenes of overcrowding that have become all too com- 
mon in jails and prisons around the country. The document will identify how inmates are 
forced to cope with those overcrowded conditions, and/or what the Correctional System is 
doing, or planning to do, to alleviate the crowded condition. 



3.3 Experimental Procedure 

The experiment consisted of two sessions. Each session involved 12 participants, one 
participant at a time. The 12 participants who took part in the first session also took 
part in the second session. An experimenter was present throughout each session to 
answer any questions concerning the process at all times. The experimenter briefed 
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the participants about the experimental procedure and handed out instructions before 
each session. Each participant was given the same descriptions of 10 TREC topics in 
text form. The 10 topics were in a predetermined order and each had a unique ID. The 
tasks were that each participant was asked to form his/her own version for each topic 
in either written form or spoken form as instructed via a graphic user interface (GUI) 
on a desktop screen (written in Java). For session 1, each participant was asked to 
form his/her queries in written form for the first 5 queries and in spoken form for the 
second 5 queries via the GUI. 

For session 2, the order was reversed, that was each participant presented his/her 
queries in spoken form for the first half topics and in written form for the second half 
topics via the GUI. Each session lasted approximately 3 hours, which gave each par- 
ticipant to finish the tasks within 30 minutes and a maximum of 5 minutes time con- 
straint was also imposed on each topic. Session 2 was carried out one week after ses- 
sion 1, this was because after the participants had taken part in session 1, they had 
familiarised themselves with the 10 topics to some degree, which would definitely 
pose a threat to the validity of our data if they worked with the same topics in session 
2 immediately. By running session 2 some time after session 1, we hoped this threat 
would be minimised. At the end of the experiment, each participant was interviewed 
for about 10 minutes and a questionnaire was administered to each participant in order 
to obtain additional information about the process by which a participant formed the 
queries. 



3.4 Data Capture 

We utilised three different methods of collecting data for post-experimental analysis: 
background system loggings, interviews and questionnaires. Through these means we 
could collect data that would allow us to analyse and test the experimental hypothe- 
ses. 

During the course of the experiment, the written queries were collected and saved 
in text format along with the duration of the formulation for each query after the par- 
ticipates typed their queries into the query field in the GUI and clicked “submit” but- 
ton. The duration of each written query was counted as the total time a participant 
spent to comprehend a topic and formulate his/her query in the query field and submit 
it. The spoken ones were recorded and saved in audio format in a wav file for each 
participant automatically along with the duration for each query. After reading a 
topic, to record a query, the participant could click “starting speaking” button and 
speak his/her query into a microphone and then click “stop speaking” to terminate the 
recording. Similarly, the duration of each spoken query was calculated as the total 
time a participant needed to comprehend a topic and record his/her query. 

The interviews sought to solicit participants’ comments on the GUI design and ex- 
planations of his/her occurrence of some exceptional behaviour the experimenter 
observed during the course of experiment. They were also asked to point out the easi- 
est and most difficult topics in written and spoken form and the reasons for their judg- 
ments. 

The same questionnaires would be handed out after the completion of both sessions 
to gather participants’ assessment on the complexity of the tasks. By comparing their 
answers, we could see how their ratings on the difficulty of the tasks would vary from 
session 1 to session 2. 
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Table 2. Characteristics of WRITTEN queries 



Data set 


ql-ql20 


Number of queries 


120 


Unique terms in queries 


328 


Average query length (with stopwords) 


9.54 


Average query length (without stopwords) 


7.48 


Median query length (without stopwords) 


7 


Average duration 


02:13 


Table 3. Characteristics of SPOKEN 


queries 


Data set 


ql-ql20 


Number of queries 


120 


Unique terms in queries 


459 


Average query length (with stopwords) 


23.07 


Average query length (without stopwords) 


14.33 


Median query length (without stopwords) 


11 


Average duration 


01:58 



4 Experimental Results and Analysis 

From this experiment, we have collected 120 written queries and 120 spoken queries. 
Some of the characteristics of written and spoken queries are reported in Table 2 and 
Table 3 respectively. 

These two tables pictured clearly that the average length of spoken queries is 
longer than written queries with a ratio rounded at 2.48 as we have hypothesised. 
After stopwords removal, the average length of spoken queries reduced from 23.07 to 
14.33 with a 38% reduction rate and the average length of written queries reduced 
from 9.54 to 7.48 with a reduction rate at 22%. These figures indicated that spoken 
queries contained more stopwords than written ones. This indication can also be seen 
from differentials between the average length and median length for both spoken and 
written queries. There had no significant differences on durations for formulating 
queries in spoken and written forms. 

The number of unique terms occurred in the written query set and spoken query set 
was very small. This was because each participant worked on the same 10 topics and 
generated a written query and a spoken query for each topic. Therefore, there were 12 
versions of written queries and 12 versions of spoken queries in relation to one topic. 



4.1 Length of Queries Across Topics 

The average length of spoken and written queries for each topic across all 12 partici- 
pants was calculated and presented in Fig. 1. 

In Fig.l, the line for spoken queries is always above the line for written queries, 
which suggests the spoken queries were lengthier than the written ones. This was the 
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case for every topic persistently. This was exactly what we expected to see. We know 
from previous studies that the textual queries untrained users posed to information 
retrieval systems are short: most queries are three words or less. With some knowl- 
edge of information retrieval and high usage of web search engines, our participants 
formulated longer textual queries. When formulating queries verbally, the ease of 
speech encouraged participates to speak more words. A typical user spoken query 
looks like the following: 

“I want to find document about Grass Roots Campaign by Right Wing Christian 
Fundamentalist to enter the political process to further their religious agenda in the 
U.S. I’m especially interested in threats to civil liberties, government stability and the 
U.S. Constitution, and I’d like to find feature articles, editorial comments, news items 
and letters to the editor.” 

Whereas its textual counterpart is much shorter: 

“Right wing Christian fundamentalism, grass roots, civil liberties, US Constitu- 
tion.” 



-♦ — written queries 



■* — spoken queries 




Fig. 1 . Average length queries per topic 




Fig. 2. Average length of queries per user 
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4.2 Length of Queries Across Participants 

We also summarised the length of queries for all 10 topics across all participants. The 
average length of queries per user is presented in Fig. 2. 

We could observe from Fig. 2 that it was the same case for every participant that 
his/her spoken queries were longer than written ones consistently. However, the varia- 
tions of the length between spoken and written queries for some participants were 
very timid. In fact, after we studied the transcriptions of spoken queries, we observed 
that the spoken queries generated by a small portion of participants were very much 
identical to their written ones. The discrepancies of length within written queries were 
very insignificant and relatively stable. All participants used similar approach to for- 
mulate their written queries by specifying only keywords. The experience of using 
textual search engines influenced the participants’ process of query formulations. For 
most popular textual search engines, the stopwords would be removed from a query 
before creating the query representation. Conversely, the length fluctuated rapidly 
within spoken queries among participants. 

We didn’t run a practice session prior to the experiment such as to give an example 
of how to formulate a written query and a spoken query for a topic, because we felt 
this would set up a template for participants to mimic later on during the course of 
experiment and we wouldn’t be able to find out how participants would go about 
formulating their queries. In this experiment, we observed that 8 out of 12 participants 
adopted natural language to formulate their queries which were very much like con- 
versational talk and 4 participants stuck to the traditional approach by only speaking 
keywords and/or broken phrases. They said they didn’t “talk” to the computer was 
because they felt strange and uncomfortable to speak to a machine. 



4.3 Duration of Queries Across Topics 

The time spent to formulate each query was measured. A maximum of 5 minutes was 
imposed on each topic and participants were not allowed to work past this. All par- 
ticipants felt that the time given was sufficient. There was only one occasion a par- 
ticipant didn’t formulate a written query within the time limit. 




Fig. 3. Average duration of queries per topic 
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The average time participants spent on each topic is shown in Fig. 3. For the first 
half topics, more time was needed to form the written queries than spoken ones but 
the discrepancy was not as great as we expected. Participants spent almost same time 
to formulate query in written and spoken forms for each of the second half topics. 
From this figure, we were able to establish that no significant difference existed be- 
tween the two query forms in terms of the duration. This appears to reduce a little 
weight to our claim that perhaps the participants would require less time to form spo- 
ken queries since that is the way people communicate to each other. However, we 
couldn’t neglect the fact that the cognitive load of participant to speak out their 
thoughts was also high. Some of them commented that they had to well-formulate 
their queries in head before speaking aloud with no mistakes. One could revise one’s 
textual queries easily in a query field, but it would be difficult for the computer to 
understand if one corrected one’s words while speaking. Information retrieval via 
voice is a relatively new research area and there aren’t many working systems 
available currently. Lacking of experience also pressurised the spoken query 
formulation process. 



4.4 Duration of Queries Across Participants 

The duration of queries per participant is shown in Fig. 4. Some participants spent 
less time on spoken queries than written ones, whereas it was a reverse case for some 
other participants. The variations of durations across all participants were very irregu- 
lar and there were no significant differences among the durations for the two forms, 
therefore, we were unable to establish any strong claims. Nevertheless, the figure did 
show that two thirds of the participants spent less time on spoken queries than written 
ones whereas only one third of the participants required more time for spoken queries 
than written ones. 




4.5 Length of Spoken and Written Queries without Stopwords Across Topics 

From the previous analysis, we know that spoken queries as a whole were definitely 
lengthier than written queries. One would argue that people with natural tendency 
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would speak more conversationally which results in lengthy sentences containing a 
great deal of function words such as prepositions, conjunctions or articles, that have 
little semantic contents of their own and chiefly indicate grammatical relationships, 
which have been referred as stopwords in information retrieval community, whereas 
the written queries are much terser but mainly contain content words such as nouns, 
adjectives and verbs, therefore, spoken queries would not contribute much than writ- 
ten queries semantically. However, after we removed the stopwords within both the 
spoken and written queries and plotted the average length of spoken and written que- 
ries against their original length in one graph, as shown in Fig. 5, which depicts a very 
different picture. 

As we can see from the above figure, the line for spoken queries is consistently on 
top of the one for the written queries; after stopword removal, each of them is also 
undoubtedly becoming shorter. Moreover, the line for spoken queries without stop- 
words stays above the one for written queries without stopwords consistently across 
every topic. Statistically, the average spoken query length without stopwords is 14.33 
and for written query, that is 7.48, which shows the spoken queries have almost dou- 
bled the length of the written ones. This significant improvement in length indicates 
that the ease of speaking encourages people to express not only more conversation- 
ally, but also more semantically. From the information retrieval point of view, more 
search words would improve the retrieval results. Ironically, for mobile information 
access, the bane is the very tool that makes it possible: the speech recognition. There 
are wide range of speech recognition softwares available both for commercial and 
research purposes. High quality speech recordings might have a recognition error rate 
of under 10%. The average word error rates (WER) for large-vocabulary speech rec- 
ognisers are between 20 to 30 percent [2]. Conversational speech, particularly on a 
telephone, will have error rates in the 30-40% ranges, probably on the high end of that 
in general. In the case in our experiment where spoken queries are twofold lengthier 
than written queries, even if at the WER at 50%, it would not cause greater degrada- 
tions on the meanings for spoken queries than written queries, in other word, the spo- 
ken information clearly has the potential to be at least as valuable as written material. 




Fig. 5. Average length of queries across topics 
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4.6 Length of Spoken and Written Queries 
without Stopwords Across Participants 

The average length of spoken and written queries with and without stopwords across 
all 12 participants is shown in Fig. 6. This graph shows a consistency with the result 
of the previous analysis that people tend to use more function words and content 
words in speaking than writing. This is a very case for every participant in our ex- 
periment. 




5 Conclusions and Future Work 

This paper reports on an experimental study on the differentiations between spoken 
and written queries in terms of length and durations of the query formulation process, 
which also serves as the basis for the preliminary speech user interface design in the 
near future. The results show that using speech to formulate one’s information needs 
not only provides a way to express naturally, but also encourages one to speak more 
semantically. This means that we can reach the conclusion that spoken queries as a 
means of formulating and inputting information needs are utterly feasible. Neverthe- 
less, this empirical study was carried out with small number of participants, further 
studies are required with larger user population to underpin these results. 

Information retrieval systems are much more sensitive to recognition errors when 
the queries are spoken than when the documents are speech recognition output [11], 
We are fully aware of this potential threat, therefore for future work, we’d like to 
transcribe the recordings of the spoken queries using automatic speech recognition 
software and identify an information retrieval system which can be used to evaluate 
the effect of word error rate of spoken queries against written queries on the effec- 
tiveness of the retrieval performance. 

In the mean time, we are carrying out a similar experiment on Mandarin which has 
a completely different semantic structure from English. The topics being used for this 
experimental study are a subset extracted from the TREC-5 Mandarin Track and the 
participants are all native Mandarin speakers with good experience in using search 
engines. The results obtained from this study will be compared to the ones reported 
in this paper. 
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Abstract. This paper presents two palmtop applications: Taeneb CityGuide and 
Taeneb ConferenceGuide. Both applications are centred around Starfield dis- 
plays on palmtop computers - this provides fast, dynamic access to information 
on a small platform. The paper describes the applications focussing on this 
novel palmtop information access method and on the user-profiling aspect of 
the CityGuide, where restaurants are recommended to users based on both the 
match of restaurant type to the users’ observed previous interactions and the rat- 
ing given by reviewers with similar observed preferences. 



1 Introduction 

Starfield display technology has been proven to provide quick, dynamic and easy 
access to large amounts of complex data through use of scatter-plot displays and dy- 
namic queries [1], These techniques have been shown to be of great benefit in search- 
ing in many domains, e.g. house purchases [2], movies [1] and musical pieces 
guide [3], However, starfield technology has traditionally been used only for large 
colour screens and is thus not widely considered suitable for small mobile devices. 

Collaborative filtering has proven to be a very successful tool in many domains for 
helping users select appropriate items from large collections [4] and has been used in 
tourism, for example, to calculate guided tours [5], Malone et al [6] describe three 
forms of information filtering: cognitive (often known as content-based), social (or 
collaborative) and economic. Balabanovic and Shoham [7] discuss the relative merits 
of content-based and social recommendation approaches, primarily that content-based 
approaches are less prone to individuals with unusual tastes or to a small number of 
ratings, while social recommendations naturally take more account of significant non- 
content information that is likely to be missed by content-only recommendations. 

This paper discusses our development of two starfield displays on palmtops - a city 
guide and a conference guide. The paper goes on to propose a combination of starfield 
displays with recommendation systems as a natural extension to starfield displays and 
describes this in the context of our city guide. In line with Balabanovic’ s and Sho- 
ham’ s work on recommendation systems for internet pages, we take a view that com- 
bination of content-based and social recommendations are likely to be most effective 
for a tourism applications. Section 2 of the paper discusses starfield displays on palm- 
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tops, section 3 our collaborative filtering approaches with section 4 concluding the 
paper. 



2 Palmtop Starfield Displays 



In a previous project we showed that a palm-top computer based starfield display was 
a successful access method for a movie database despite being used on very small, 
monochrome, low resolution screens [8]. In that work, we compared using traditional 
palmtop-style access and starfield access to a collection of movies using two zoom- 
able axes - year of release (x) and popularity of movie (y) together with direct on 
screen filters for movie genre (e.g. comedy, thriller...) and film classification certifi- 
cate (e.g. U, PG...). The results confirmed our belief that starfield displays could be 
used on such small screens. Figure 1 shows an example search for all non- 18 certifi- 
cate movies excluding comedies. As well as providing fast searching, Starfield dis- 
plays provide two main benefits over traditional data access methods: dynamic feed- 
back and intuitive transitions from data overviews to focussed searching. 




Fig. 1. PalmMovieFinder 

Dynamic feedback is supported through controls and filters over the dataset. These 
controls support users searching the database rapidly and provide easy correction for 
many traditional database problems, e.g. when no data matches a query. In traditional 
database searching, null queries are notoriously difficult for users to correct - it is 
very difficult to slightly weaken a complex database query. In contrast, with starfield 
displays the query is built up in stages and the user knows precisely what (s)he did to 
cause the null query, thus (s)he can quickly undo that operation. For example, in the 
PalmMovieFinder a user looking for the lowest certificate thriller would deselect all 
genres bar thriller and then deselect 18, 15, 12,... in decreasing order. Once there are 
no matches turning that certificate back on shows the lowest certificate thrillers. 
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As highlighted in the HomeFinder [2] users can use the searching method to get an 
overview of the data and rapidly focus in on areas of interest. In the HomeFinder, 
users are given filters including type of property and house price on a geographic map 
of an area. When lowering the maximum price filter for a selected type of house, 
users rapidly learn the expensive areas of a city because they are the first to disappear. 
This kind of clustering or data overview is very hard to achieve with non- visual inter- 
faces. If a user identifies a suitably priced area near his/her office (s)he can then zoom 
into that area and naturally restrict further queries to that geographic area. 

We have developed two starfield displays on palms: the CityGuide tourist informa- 
tion application based around a geographic map using starfield display to show tourist 
attractions around a city and ConferenceGuide based around a timetable visualisation 
of a conference. Both applications have been developed for high resolution (320x320) 
PalmOS devices (the CityGuide in colour, the ConferenceGuide in greyscale) and 
were developed using a combination of PalmOS C and Sybase database storage. 



2.1 CityGuide 

The CityGuide application is designed around a map-based starfield (c.f. HomeFinder 
[2]) display to help tourists find attractions around a city. Our current implementation 
is based on a guide to Glasgow and contains an extensive restaurant guide with some 
information on cinemas, theatres and pubs. Brown and Chalmers [9] state that “tour- 
ists deliberately make plans that are not highly structured and specific, so that they 
can take advantage of changing circumstances”. Our aim in developing this applica- 
tion of mobile starfield technology is to support tourists’ unstructured searching of a 
city centre. 

Fig. 2 shows a map of Glasgow city centre with an overlay of all restaurants, repre- 
sented as squares. The map interface offers typical electronic map features such as 
zooming and panning: a user tapping on the display in Fig. 2 over, say, Central Station 
would zoom the map into that location, a further tap zooms further into that location 
and then the user can tap on a square attraction icon to see the name and then again for 
more details on that attraction. Users can pan the display by dragging with their stylus 
and zoom out by clicking on the zoom icon. 

On the top right of the display are a set of dynamic filters for controlling what 
points are shown on the starfield display, here showing type of attractions as restau- 
rant (HI), the restaurant-type/menu filter (13) and the restaurant price filter (^). Single 
choice filters are controlled by a pop-up menu, for example price in Fig. 3B. Due to 
limitations in PalmOS, multiple choice filters are controlled via a pop-up window, for 
example restaurants food type filter is shown in Fig. 3A. The results of a query are 
displayed directly on the starfield display as a revised set of attraction icons that 
match the current set of filters. 

Brown and Chalmers [9] state that “when choosing where to go to, it is often safer 
to pick an area with more than one potential facility”. Providing this kind of clustering 
information is one of the traditional strengths of starfield displays. Fig. 4 shows a brief 
interaction after applying the filters shown in Fig. 3 - here the user zooms into a 
promising looking area of the map, clicks on one restaurant then clicks on the restau- 
rant name to get full details. 
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Fig. 2. Restaurant guide to Glasgow 1 



multi 



□ Scottish S' Far East 

□ French □ Seafood 

Vegetarian □ Indian 

S' International □ Italian 

( All ) ( None ) 



( OK ) ( Cancel ) 



Fig. 3. Restaurant query filters (A: restaurant filter, B: price band filter, C: result of filters) 

Users can mark an attraction as being on “My List” (similar to favourites/book- 
marks in web browsers, see last image in Fig. 4) and later filter to show only attrac- 
tions that have been added to this list. Users can also write their own reviews (see last 
image in Fig. 4), and have these published for others to read. On the bottom right of 
the map display is a scale bar for relevance and a list view - both of which are dis- 
cussed later. 




1 Colour images are available at www.taeneb.com 
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Cafe Mao 

84 Brunswick Street, 
Merchant City, 
Glasgow, 

City of Glasgow, 
Scotland, 

G1 1ZZ. 



Rating: ▼ Good 



Star food and decently priced. 
Lovely spacious restaurant, great 
for business lunches. 



□ M y List? ( Back ) S’ Publish • □ My List? ( Back ) 

Fig. 4. Zooming into full details on Cafe Mao after applying fig. 2 filters 



2.2 ConfereneeGuide 

Based around a timetable starfield display the ConfereneeGuide initially shows users 
an overview of a day at a conference (see Fig. 5 for a sample day from our trial con- 
ference - EMAC 2003). Here parallel streams are shown as vertical columns with 
plenary sessions (in case of Fig 5, only breaks) being horizontal bars across all col- 
umns - closely reflecting the standard PalmOS Date Book application. Clicking on a 
session shows its name in the info box at the bottom of the screen with a further click 
giving full session details (e.g. session name and theme together with a list of talk 
titles, speakers and abstracts). 



EMfiC: 22 May 2003 



9 □□□□□□□ © 
"□□□□□□a 

Hmn 

[More 

Elm [ Home ) 

Fig. 5. EMAC conference timetable overview 



Filters are provided on session theme and expected audience (not shown in fig 5). 
The session theme filter is initially set for all themes and users can limit the filter to 
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show only themes they are interested in (e.g. “Consumer Behaviour" or “Social Re- 
sponsibility”). As with the CityGuide, users can add a session to their “My Sessions” 
list for later filtering to show only these sessions (a helpful tool for planning time at a 
multi-parallel conference) and view a textual version of a day’s events. The Confer- 
enceGuide also supports inter-delegate communication through messaging and discus- 
sion forum services associated with each conference session. 



2.3 Implementation Details 

Both applications were developed on PalmOS devices (primarily Sony Clies) using 
Metrowerks C. Extensive use was made of Sybase databases to hold the majority of 
the data and to manage synchronisation between palm, desktop and internet versions 
of the databases. Synchronisation was primarily via a physical connection to a net- 
worked PC but the Conference Guide was tested for "virtually continuous" wireless 
synchronisation using a wi-fi enabled Palm. 



2.4 User Trial 

A prototype of the CityGuide and ConferencePlanner were distributed to delegates at 
the EMAC 2003 conference on “Marketing: Responsible And Relevant?”. Twenty 
delegates were selected by the conference organisers for the trial and were given a 
Sony PalmOS Clie greyscale device for the duration of the conference. The city guide 
was populated in advance with a restaurant, pub and cinema guide (including “what’s 
on” information). Some restaurants were populated with reviews but both trial users 
and other delegates (through combined paper review forms and prize draw entries) 
were encouraged to write new reviews. At the conference venue the participants were 
provided with access to synchronise the software and were encouraged to do so at 
least daily. The feedback was gathered after two days of trial in a form of informal 
interviews. 

A lot of interest in the application was shown by people who had experience of 
palmtops and by those interested in high-tech applications. People who had other than 
PalmOS devices, mostly PocketPC-based, expressed considerable disappointment that 
they could not use their own palmtop. One user found our PDA device in general too 
small and difficult to read and gave up using the system after the first day of trial. 
However, in general the users found the interface easy to use and intuitive - even 
those who had never used a palmtop before. They used the CityGuide mostly for 
searching for restaurants, they found the starfield interface an easy way of finding a 
restaurant and the review system very helpful during their selection of a restaurant (in 
particular delegates found it helpful to be able to read the opinion of other delegates 
attending the conference and most users added their own reviews). 

The overall feedback was positive with many suggestions of extending the data 
content (such as adding museums, galleries, train timetables etc.) and connectivity of 
the device to give live update features. 
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3 Palmtop Collaborative Filtering 

In this project we investigated combining a recommendation system with starfield 
displays to provide a filter on “relevance” in addition to the more traditional database 
style filters. Our recommendation system is based partly on content-based matching 
between user profiles and attraction profiles and partly on a social element from simi- 
lar users’ ratings of, say, restaurants. To build each user's profile we use a combina- 
tion of implicit and explicit ratings. Nichols [10] highlights the problems of achieving 
a satisfactory number and quality of explicit ratings, where users are requested to 
explicitly score each, in his case, document: “the act of rating alters a user's behaviour 
from their normal pattern of reading” and that “unless the user perceives some benefit 
for participating in the system then they have an incentive for leaving”. Within the 
tourism domain, local residents have a clear incentive for writing reviews of restau- 
rants that their friends/colleagues may benefit from and thus they will benefit from 
their colleague’s and friend’s reviews. However, for visitors to a city there is little 
incentive for a user to write reviews as there is now little direct link between gain and 
effort. In contrast, implicit ratings are developed by simply monitoring a user’s be- 
haviour with the system. Nichols identifies the following potential types of implicit 
information: purchase, assess, repeated use, save/print, delete, refer/cite, reply, book- 
mark, examine/read, consider, glimpse, associate, and query. Most of these categories 
are used to build the user’s profile in the CityGuide, as discussed below. 

This section reviews our model for combining explicit information with implicit 
monitoring of the user's interaction and discusses how these are used to drive a rele- 
vance filter on a starfield display. 



3.1 User Profile Building and Direct Matching 

For each filter on the city guide we keep an individual user weight for each filter- 
value (e.g. Italian for the food-type filter value). When our prototype system is started 
users are asked to fill in a brief questionnaire for each food type (e.g. how much they 
like Italian food on a 5-point likert scale from “hate” to “love”). These initial scores 
are given a weight of 0 (hate) to 50 (love) and are then adjusted based on implicit 
ratings. 

Following a scheme similar to Nichols and inspired by relevance feedback tech- 
niques in information retrieval [e.g. 11], scores for each matching criteria (e.g. Far- 
eastern and sea-food) are adjusted for many user actions in the interface. Currently the 
weights are adjusted as follows (Nichols’s categories shown in parenthesis); when a 
user: 

• writes a very good review of restaurant in that type: score +5 (assess) 

• writes a good review of restaurant in that type: +3 (assess) 

• writes a medium review of restaurant in that type: +1 (assess) 

• writes a very bad review of restaurant in that type: -1 (assess) 

• filters with this type turned on: +2 (c.f. query) 

• add to “my attractions”: +3 (bookmark) 

• gets more details of a restaurant in this type: +1 (examine) 

• read reviews of a restaurant in this type: +1 (examine) 
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In Nichols scheme reply, examine and glimpse were considered to be time based - 
the longer a user spent examining, the more important the document. For a mobile 
setting we felt this to be unreliable as levels of interruption were likely to be much 
higher, thus more frequently giving misleading measures of, say, how long a user 
spent reading a restaurant review. As such a fixed increment was used instead of a 
time-based measure, future experimentation is needed to investigate this decision. 

Fig. 6 shows a sample rolling user profile after using the system for some time. Us- 
ers would not normally see this information but it highlights how the system has de- 
veloped a simple model of the user’s tastes. Here the user has viewed/reviewed more 
on International and Far Eastern food than other categories thus we assume (s)he has 
a preference for these categories and has a relative dislike of sea-food. 



Current RUP 



Scottish 


81 


French 


80 


Vegetari 


82 


Internati 


108 


Far East 


108 


Seafood 


61 


Indian 


80 


Italian 


100 




( Done 



Fig. 6. Sample Rolling User Profile 



3.2 End User Reviews 

As shown earlier (Fig. 4), one of the core elements of the CityGuide is community 
reviewing where all users can read and write reviews. When a review is submitted the 
author’s current Rolling User Profile is submitted with that review. This allows re- 
views to be measured for closeness to the current user’s views (for example someone 
who hates expensive Indian food will have different view on an Indian restaurant to 
someone who loves all Indian food) 2 . 

Inspired by free text information retrieval techniques [e.g. 12] we calculate each 
user a personalised rating, PARR/, for restaurant R t as follows: 

p ARR . = 'Z<X>sQ i a i 'P«)*R ai 

X cos (p ai ,p u ) 

where 

P u = user’ s current profile for whom we are personalising 
P ai = author’s rolling user profile at time of submitting review i 
R al = author’s rating for restaurant R t (scaled to between 0 and 1) 
cos = cosine function for matching document vectors (see, e.g. [12]) 

PARR i ’ is a value between 0 and 1 

summations are carried out over all reviews for restaurant R t 



2 The current implementation only supports food-type. 
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The final personalised review value is given as follows to reduce the effect of only 
one review and manage zero reviews: 

PARR, = 

• 0.5 if there are no reviews 

• (PARR. ' + 0.5) / 2 if there is only one review 

• PARR’ if there are two or more reviews. 

Given the user ratings shown in Fig. 6, Fig. 7 shows Glasgow Restaurants rated by 
PARR given the current database of community reviews. This textual view is a useful 
complement to the starfield display for when location is not a prime issue in the user’s 
selection - all filters work identically on the list view and map view and the user can 
rapidly flip between the two views. 



Name PARR 


ri 


Frango 


100 


Fratelli Sarti 


100 


m 


Gamba 


100 




Windows - Carlton Ge 


100 


* 


Cafe Gandolfi 


94 


Cafe Mao 


93 




Arta 


88 




Mussel Inn 


88 




The Willow T ea Room 


88 
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Fig. 7. Glasgow Restaurants rated by PARR 



3.3 Combining Review Ratings and User Profile 

The Personalised Rating (PARR) uses explicit review ratings of restaurants which are 
biased towards reviews from people with similar profiles to the user. However, the 
user’s rolling profile does not directly impact these scores (it simply impacts the belief 
given to others’ reviews). In contrast, the Rolling User Profile (RUP) does not take 
into account restaurant reviews. 

These two scores are simply combined into the Combined Attraction Score (CAS): 
CAS , = RUP, * PARR,. 

While this scheme appears to work well, longitudinal studies are required to collect 
substantial amounts of data in order to formally experiment with different recommen- 
dation approaches and refinements of our current approach. 



3.4 Combination Filtering 

Bringing together collaborative community reviews and starfield filtering, we have 
added a “relevance filter” to the starfield map display. On the bottom right of the 
display a set of bars represent the openness of the relevance filter (from very open to 
very restrictive). Fig. 8 shows all restaurants in Glasgow on the left, reasonably tightly 
relevance filtered in the centre and only the best match on the right. 
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Fig. 8. Relevance Filter on wide, medium and tight settings 



The relevance filter is driven by the Combined Attraction Score (CAS) to recom- 
mend restaurants based on both their review ratings and their match to the user’s roll- 
ing profile. This filter works in combination with other filters so, for example, the 
tightest relevance filter will show the highest CAS ranked attraction that matches the 
current restaurant-type and price filter settings. 

While not currently implemented we envisage a similar system for conference 
guides that would help to identify both sessions and individual relevant papers in con- 
ference with many parallel sessions. 



4 Discussion 

In this paper we have presented two novel starfield display implementations on palm- 
top computers: the CityGuide, based around a tourist attraction guide on a geographic 
plan of Glasgow, and the ConferenceGuide, based around a timetable for a multiple 
parallel session conference. Both interfaces have proven easy to use in a user trial of 
conference delegates visiting Glasgow. 

The paper has also proposed a combined content and social recommender system 
for restaurant reviews based on a hybrid explicit/implicit rating system. This rating 
system is then used to drive a novel interaction tool, the relevance filter, on a starfield 
display so that users can directly control how much the system recommendations are 
taken into account when looking for items on the starfield display. 

We are currently planning more formal user and technical evaluation of the algo- 
rithms and interface. Other context such as weather, user’s current context, c.f. [13], 
and distance of attraction from current location, c.f. [14], are also being investigated 
as possible inputs to the recommendation system for general tourism attractions (e.g. 
walks in beautiful but distant botanic gardens tend to lose their appeal in heavy rain). 
Distance is likely to be more useful in textual lists, e.g. Fig. 7, as starfield displays 
naturally support zooming into a sub-area of the map, the interaction between these 
two views is also being investigated. We are also investigating possible improvements 
to the interface through using recommendations to guide the application of labels to 
some attractions (c.f. [15]). 

In conclusion starfield displays on small devices have been shown to be successful 
on small devices and combining these with a recommendation system provides a 
powerful information access interface for small handheld devices. 
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Abstract. The paper presents the guidelines of a project of three Italian Univer- 
sities (Bologna, Siena, Trento) which aim is to investigate the use of mobile 
computing technologies to support the learning processes in a University con- 
text. The project covers three main areas. The first area is concerned with find- 
ing effective models for mobile learning. The second regards the evaluation of 
learning processes in mobile learning environments. The third focuses on the 
technological aspects of mobile learning, and on their integration with e- 
Learning systems, and more generally, with the information systems of the aca- 
demic institutions. The project has its foundations in the availability of signifi- 
cant experience on e-learning real processes, and on the availability of the 
source code of an e-learning system developed in previous projects and cur- 
rently used by different faculties, and of the newer platform that gathers the ex- 
perience obtained in the past. 



1 Introduction 

Mobile learning is a field which combines two very promising areas - mobile 
computing and e-learning. Mobile learning could be considered any form of learning 
(studying) and teaching that occurs in a mobile environment or through a mobile 
device, like cellular phones, Personal Digital Assistants (PDA), smartphones, tablet 
PC etc. On the other side of mobile learning, we have e-learning, i.e., every educa- 
tional process assisted by computers through the networks, and Internet in particular. 
M-learning has been considered as the future of learning or as an integral part of any 
other form of educational process in the future. 

As m-learning is quite a new domain, there is a lot of work and research that is 
presently going on. Specifically, people are trying to understand: 

• which learning models can help obtaining better learning processes when commu- 
nication is mediated by mobile devices, and how the student mobility affects 
her/his learning process. 
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• how it is possible to evaluate efficiency and effectiveness of learning processes 
based upon mobile technologies, given the physical limitation of mobile devices. 

• which services are useful for mobile devices, which is the enabling technology that 
can affect the wide diffusion of mobile learning. 

A mobile learning educational process can be considered as any learning and 
teaching activity that is possible through mobile tools, or in settings where mobile 
equipment is available. National and international researches in the m-learning field 
are geared towards some lines that we shall here overview. Different devices that 
exist and all the devices that are coming up on the market, with their limitations and 
advancements, provoke different ideas for applying them on learning, thus any device 
can mean different m-learning. Among the open problems, some are relative to the 
pedagogical use of mobile devices. Since the m-learning term appeared for the first 
time, some research has been done to investigate the cognitive and pedagogical as- 
pects. 

Investigation had been done also on how useful mobile computing devices could 
be for reading or for workplace activities [1], on the basis of studying activity theory. 
Some authors [2] try to give directions to application designers for the areas, where 
the mobile devices should be most useful. Others [3] are trying to achieve conclusions 
by analyzing the theories of adult informal learning. In a few papers some interesting 
positive sides of using new technologies are underlined i.e. the participants are ex- 
cited and want to try “new” things. 

Some findings show that introducing new forms of teaching (even if this means just 
using a standard tool for drawing on a PDA) make students spend more time in work- 
ing on that subject, comparing to the other subjects. [4] The currently evaluations and 
analyses of m-learning projects show many positive results. On the other hand there 
are some doubts if this excitement is, or is not, a temporary side effect. Most of the 
researchers think ([5] [6]) that PDAs and other mobile devices should be seen more 
like extension, rather than replace the existing learning tools. Moreover not all kinds 
of learning content and/or learning activities are appropriate for mobile devices [7], 
The paper will present our view regarding the topic on mobile computing. In par- 
ticular, we’ll present a project of our three Universities in which we want to use an 
existing Learning Management System and adapt it to the needs of mobility, having 
the source code of the system available. This mobile platform will be used to test 
principally new models for learning in mobile settings and tools for assessment of 
learning process through the use of mobile technologies. These objectives will be 
pursued through: 

• Adoption of a well tested e-learning platform adapted to the usage of mobile de- 
vices 

• Implementation mobile computing services in a University setting 

• Study of learning models linked to mobile technologies 

• Study of learning evaluation models based in an m-learning environment 

• Design and development of Learning Objects suited to mobile learning, together 
with services for evaluating their effectiveness 

• Experimentation of prototypes built in real learning processes 

The paper is organized as follows: first we will present the state of the art in mobile 
learning, then we will briefly present the three elements that in our opinion help to 
build a mobile learning environment, i.e., models, evaluation systems, back-office 
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tools. Next we will present the guidelines for studying new models for learning proc- 
esses in mobile settings, and one approach for the evaluation of these processes. Fi- 
nally, the problems faced and choices made regarding the adaptation of a Learning 
Management System to mobile needs will be outlined. 



2 State of the Art in Mobile Learning 

The state-of-the-art in mobile learning research is heavily conditioned by the features 
of the devices available on the market. Different user interfaces, capabilities and con- 
nectivity may generate different ideas for possible learning applications: each single 
device can mean a different way to “m-learn”. We shall here review the main trends 
and indicate some of the relevant papers in the field, with special attention to the 
themes that are more closely related with the aim of the present paper. A more exten- 
sive analysis of the state of the art can be found in [8] [9]. 

Among the open problems, some are relative to the pedagogical use of mobile de- 
vices. Some research has investigated the cognitive and pedagogical aspects. Investi- 
gation had been done also on how useful mobile computing devices could be for read- 
ing or for workplace activities [1], on the basis of studying activity theory. Some 
authors [2] try to give directions to application designers for the areas, where the 
mobile devices should be most useful. Others [3] are trying to achieve conclusions by 
analyzing the theories of adult informal learning. In a few papers some interesting 
positive sides of using new technologies are underlined i.e. the participants are ex- 
cited and want to try “new” things. Some findings show that introducing new forms of 
teaching (even if this means just using a standard tool for drawing on a PDA) make 
students spend more time in working on that subject, comparing to the other sub- 
jects. [4] The current evolution and the analyses of m-learning projects show many 
positive results. On the other hand there are some doubts if this excitement is, or is 
not, a temporary side effect. Most of the researchers think ([5] [6]) that PDAs and 
other mobile devices should not be seen as a replacement of existing learning tools, 
but rather as a new and different opportunity. Moreover not all kinds of learning con- 
tent and/or learning activities are appropriate for mobile devices [7], 

People are experimenting with the application of m-learning to different fields: a 
promising one is language learning. At Stanford Learning Lab [10] an exploration of 
mobile learning has been done by developing prototypes that integrate practicing new 
words, taking a quiz, accessing word and phrase translations, working with a live 
coach, and saving vocabulary to a notebook. They envisioned that a good approach 
would be to fill the gaps of time by short (from 30 seconds to 10 minutes) learning 
modules in order to use the highly fragmented attention of the user while on the 
move. The research indicates some very useful directions, like the length of the learn- 
ing materials, the personalization of interaction and the frustration of the user and the 
decreasing of the perception of the learning materials because of the poor technologi- 
cal implementation. In the same field an ongoing project [11] aims at porting to mo- 
bile systems an ad-hoc language-learning system developed for the special needs of 
an Italian bilingual region, where every public officer is supposed to be fluent in Ital- 
ian and German. One problem investigated in this context is the one of anticipating 
user’s need and pre-caching the needed content when a cheap and fast connection 
(such as a direct connection via cradle) is available, since the whole material is too 
large to fit in a small palmtop device. 
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Many authors approach m-learning in the context of life-long learning. One of the 
biggest initiatives in such domain is the HandLeR project [12] (University of Bir- 
mingham). The project attempts to understand in depth the process of learning in 
different contexts and to explore the lifelong learning. The stress is on communication 
and on human-centred systems design. Similar in some concepts to HandLeR is the 
project undertaken at the Tampere University of Technology (Finland) [13], where 
PDAs are used for mathematical education of children. The study-content is presented 
in the form of a game where the pupils can communicate and help each others and the 
electronic device is used to measure the average students’ knowledge level and to 
adapt the speed of presenting new material to the learners’ ability. 

One of the most straightforward application of the usage of mobile devices as edu- 
cational supporting tool is messaging. At Kingston University (UK) an experiment 
was undertaken to research the effectiveness of a two-way SMS campaign in the uni- 
versity environment [14,15]. The team has developed a system that sends SMS to 
students, registered to the service, about their schedule, changes in it, examinations 
dates and places, student’s marks and etc. The conclusions of the experiment were 
that the students in certain scenarios where a certain type of response is required pre- 
ferred SMS as a medium to e-mail or web-based announces. SMS could be efficiently 
used in education (m-learning) as a complementary media. As the technology im- 
proves (i.e. EMS and MMS, potential more user-friendly interface) the potential in- 
creases too. For this reason, as explained in the next sections, we decided to include in 
our experimentation the management of SMS from teachers / administrative staff to 
students as one of the approaches to info-mobility. Also at the University of Helsinki 
the LIVE ( Learning In Virtual Environment ) experiments, made with SMS system and 
with WAP phones, were very positive [16]. The project went on by introducing digital 
imaging and sharing photos between the participants (teachers). The conclusions were 
that it is very possible that the introduction of MMS and the other 3G services in the 
large scene will lead to more and more possibilities for m-learning. Another project 
[17] on evaluation of a Short Messaging System (SMS) to support undergraduate 
students was done at Sheffield Hallam University. The implemented system was 
again not for learning, but for managing learning activities (to guide, prompt and 
support the students in their learning). The findings were overwhelmingly positive, 
with students perceiving the system to be ‘immediate, convenient and personal’ . Posi- 
tive results were underlined and after the outcomes from a survey in Norway - almost 
100% of the students in that University have cell phones and SMS system would be 
widely accepted [18]. Once again an SMS system was considered to be used to spread 
information about lectures and classes, corrections in the schedule and etc. In certain 
cases students find it more convenient than e-mail or WWW as the information al- 
ways comes on time. These projects open some very important issues to be consid- 
ered in doing further research in the mobile learning domain. One is that the current 
technology gives enough powerful instruments to support some new forms of auxil- 
iary learning tools. They also show the enthusiasm of the students to accept such new 
technologies. 

Several m-learning projects focus on of how to apply e-learning techniques and 
content on mobile platforms. The UniWap project ([19] [20] [21] [22] ) concentrated on 
testing the use of WAP technology in higher education, by exploring the process of 
creating an operating environment for studying and teaching through smart-phones 
and WAP phones. One phase of the project was to create some working prototypes 
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(courses modules) and to investigate the problems and the value of such courses. The 
positive results they encountered (easy to develop, willingly accepted and widely used 
modules) encourage them to continue investigating the new coming technologies - 
digital imaging with mobile devices, 3G, etc. At Ultralab M-Learning project the team 
is producing m-learning materials for people with literacy and numeracy problems 
[23], [24]. A great potential is encountered from the cognitive and pedagogical point 
of view, even by using simple development tools (Macromedia Flash). 

“From E-learning to M-Learning” [7] is a long-time project that aims at creating a 
learning environment for wireless technologies by developing course materials for 
range of mobile devices. The authors discuss the devices characteristics that are 
proper for learning and highlight analogies and differentiation between e-learning, d- 
learning (distance learning) and m-learning. They also try to predict which methods 
and technologies should be used for successful m-learning. 

Tourist and museum guides are often considered to be applications in mobile learn- 
ing domain. They usually refer to newest technologies as location-discovery via 
GPRS, radio frequency or etc. However we rather consider them as a separate 
applicative field and therefore we will not discuss them in this context. Also, due to 
space constraints we cannot discuss the very interesting approach of using mobile 
devices in the framework of collaborative and problem-based learning. The interested 
reader can find indications and a short discussion of this topics in [9], 

In conclusion, the overall view on the existing research work and projects in the m- 
learning domain shows that it most probably applies best to processes, where specific 
knowledge should be retrieved/accessed in a certain moment, where discussions in 
distributed groups (i.e. brainstorming) appear, where data is collected or utilized “on 
the field”, and where context-information is strongly related to the learning content. 
The nature of mobile devices, with their small screens and poor input capabilities 
leads to the assumption that they can not replace the standard desktop computers or 
laptops. However, the same properties can make them efficient in learning domain, if 
certain constraints are kept ([7] [17] [25] [26] ): 

• Short modules (max 5-10 minutes). Users should be able to use their small frag- 
ments of waiting time (i.e. waiting for a meeting or while travelling in a train) for 
learning, like reading small pieces of data, doing quizzes or using forums or chat 
for finding answers to “on field” questions. 

• Simple, funny and added value functionality. The limited computational power and 
the other properties of mobile devices (as they are today) make it difficult to use 
complex and multimedia content. One should find it more interesting or necessary 
and useful (or at least equally) to study using this m-learning system in his/her 5 
min. break than playing a game on the same device. 

• Area/Domain specific content, delivered just in time/place. The mobility should 
bring the ability to guideline and support students and teachers in new learning 
situations when and where it is necessary. The dependency of the content can be 
relative to location context (i.e. the system knows the location where the learner 
resides and adjusts to it), temporal context (i.e. the system is aware of time de- 
pendent data), behavioural context (i.e. the system monitors the activities per- 
formed by the learner and responds to them adjusting its behaviour) and interest 
specific context (i.e. the system modifies its behaviour according to the user’s 
preferences). Of course a mix of the contextual dependencies is possible and likely. 
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3 The Three Elements of Building 
a Mobile Learning Environment 

As said in the introduction, the aim of the project has three key elements. Firstly, we 
are interested into analyzing and viewing the system as whole and thus researching, 
whenever it would be possible, models that would allow us to individuate the rela- 
tionships that connect those elements, as well as their knowledge value and reach. 
Therefore, the concept of model becomes the basis to connect the learning process 
with the languages, the methods, and the tools that are employed to implement and 
experiment the Virtual-Real Learning Communities. Such communities should deliver 
evaluations of the result of learning process and objective measurements parameters, 
which are (possibly) independent from the teaching contents. 

A second but not secondary issue is concerned with how to evaluate the m-learning 
tools and their model as a function of the induced quality in the learning processes. 
Talking about good quality in distance learning is undoubtedly a not easy task. Not 
easy for various reasons, first among everybody because has not closed the debate on 
what he understands, in more general sense, for quality of a formative intervention, 
with all what which this involves yet: didactic effectiveness, social and professional 
impact, investment, etc. We would like to assume for quality not as much the excel- 
lence as rather the management of a continuous process to approach the most possible 
the wished effect (for instance, what one wishes is learned) to real effect (what which 
has been learned). We call such systems closed ring, key element of this kind of sys- 
tematic realignment is a constant monitoring aiming to the evaluation both of the 
users and of the whole process. The system of new generation which we intend to 
develop is based on the interaction of all the parts of the process, to give way to the 
distributor of the formative action, to monitor the process and to regulate it, when 
necessary, wished to redirect it adequately toward the effect. 

A key element for this is a constant monitoring, whose aim is to both evaluate us- 
ers and the whole process. The new generation tool that we intend to develop is based 
on the interaction of all process components, so as to allow tutors to monitor and steer 
the process. In such way it will be possible to achieve a better coherence with the 
stated objectives, making therefore easier to reach the desired goals. More in detail, 
the evaluation of the proposed system is expressed in functionalities which refer to 
various kinds of Assessment. The first and simpler functionality is the self-evaluation 
which must be understood as complement of an educational process. The self- 
evaluation is not sufficient to guarantee the success of an educational process, in fact, 
not all the students are able to self-manage it in an effective way. So, we would like to 
consider some other assessment strategies. The evaluation process assumes as a good 
evaluation is not reduced to the administering of a final test and to the production of a 
judgment, or more simply of a vote. The assessment must to precede, to follow and to 
direct all the formative process. That means that the system obtains information about 
the students before beginning a course (using previous relationships with the same 
student or a diagnostic test), during the development (through the analysis of link and 
documents chosen by the same students, explicit preferences and formative tests) and 
in conclusion of unitary subject sections. 

A big complexity resides in the difficulty for the electronic computers to semanti- 
cally interpret sentences in natural language. A first approach to the problem has been 
performed trying to isolate the verifiable difficulties in traditional testing systems 
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(refer in particular to the North American model, which uses questions with answers 
to multiple choice). These have been summarized in the following six points, con- 
cerning multiple choice tests: 

• they are concerning the results of the learning process, not to the processes 

• they underline the knowledge level not the potential of learning 

• they are far from the working contexts 

• the memory can sometimes be more useful than the comprehension 

• the so-called tests-taking skills can affect the result. 

Possible answers to these problems are presented in [27], In the context of the pre- 
sent project we would like to highlight two particulars. First of all, the personalization 
of the tests is possible only in presence of a student model that memorizes a descrip- 
tion of his expertise and brings up to date. Besides, the enlargement of the field of 
action of the evaluation, from the results to all the educational process, makes it pos- 
sible the use a graph structure. 

As a third key element of the project, in order to support the experimentation of 
any tool or technique of m-learning, a rather complex information system is neces- 
sary. Its role includes distributing didactic material, users identification and authoriza- 
tion, gathering of data relative to the user- system interaction, provisioning of mobile 
services, supplying statistics on level of usage and satisfaction etc. From this point of 
view, the project attempts to interconnect m-learning technologies with e-learning, 
and e-learning is in turn always more integrated in the information systems of aca- 
demic institutions. 

E-learning systems, and Learning Management Systems (LMS) in particular, are 
nowadays a key element in the learning processes that take place at Universities, and 
they are widely investigated in literature [28], [29], [30], [31]. Several implementa- 
tions are available on the market, like for instance LearningSpace™, WebCT™, 
Blackboard etc. [32]. They are in the middle of a transformation from simple support 
of on-line learning (like in the case of LMSs) into real information systems (Learning 
Information Systems -LIS). As such, they integrate many components of the wide 
spectrum of a formative action [33]. Our project needs to integrate such systems with 
our project’s specific mobile-computing requirements. This means that we have to 
focus mainly on two points: on the one hand we have all the administrative and back- 
office processes of a Faculty (e.g. exam registration, didactic design, theses manage- 
ment, bookkeeping of teacher’s activity, University marketing etc.). 

On the other hand, research attempts to focus on the technological evolution that 
brought to people mobility and mobile terminals (PDAs, pocketPCs, cellular phones, 
smart-phones, tabletPCs etc.) that are now present in every day’s life. These tools are 
an interesting for a LIS, since they allow the various actors (such as students, teach- 
ers, administrative personnel etc.) to have a mobile platform that keeps them in touch 
with the LIS wherever they are. The possible applications are therefore very many: 
we can for instance think at the possibility for a secretary to communicate with mo- 
bile-technology enabled students, or at possible mobile collaboration among teacher 
and students within a course framework (our research will explore this aspect). 

Some work has been done on Learning Management Systems, but the idea of a 
University Information System having a mobile component that belongs to the skele- 
ton of the Information System is still in its infancy. It is therefore clear that it is not 
possible to be concerned with single classes of actors without considering the whole 
picture, since LIS aggregates users with different roles. The focus therefore moves 
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from a system dealing with “courses” to a system that deals with “virtual communi- 
ties”, i.e. with a generalized communication space that allows using a variety of tools 
to support collaboration needs that may arise in various situations. A virtual commu- 
nity can be supported at various levels by mobile technologies. LIS, in our definition, 
become computerized tools that give various kinds of services to virtual communities. 
Such services can be adapted to the special needs of a given community. One research 
aspect of the present project is therefore linked to virtual communities and info- 
mobility related to learning: we intend study and experiment how activities of an e- 
learning portal can be integrated with the emerging mobile technologies. The research 
group will use an already existing community-oriented e-learning portal that has been 
in use for some time to integrate and test mobile technology and related methodolo- 
gies. 
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Fig. 1 . A general schema of the prototype 



4 Evaluating Mobile Learning Settings 

The experience from years of development and use, the advance of technology, and 
the development of authoring tools for questions and tests has resulted in a sophisti- 
cated, computer based assessment system. However, there is still a lot of room for 
further development. Some of the current ideas for development are discussed in the 
remainder of this paragraph. In line with many writers in the field of assessment, we 
distinguish three types of assessment: 
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• diagnostic assessment; it provides an indicator of a learner’s aptitude for a pro- 
gramme of study and identifies possible learning problems; 

• formative assessment; it is designed to provide learners with feedback on progress 
and informs development but does not contribute to the overall assessment; 

• summative assessment; it provides a measure of achievement or failure made in 
respect of a learner’s performance in relation to the intended learning outcomes of 
the programme of study. 

The most common distinction in the literature is that made between formative as- 
sessment and summative assessment. A formative computer-based test is described as 
one where the results of the test do not contribute to a student’s final grades. Instead, 
the student’s scores are used to assist in improving the student’s learning, often by 
identifying weaknesses in the student’s knowledge and understanding of a given area 
or by helping them to identify and correct misconceptions. In a similar way, lecturers 
can also make use of the results obtained to help them improve their teaching by iden- 
tifying areas that students have found difficult to understand. Nonetheless, in many 
assessment activities the difference is not so evident. 

A primary aim of assessment is provide the necessary information to improve fu- 
ture educational experiences because it provides feedback on whether the course and 
learning objectives have been achieved to satisfactory level. Yet, it is important that 
the assessment data be accurate and relevant to effectively make informed decisions 
about the curriculum [34], As discussed above, formative assessment can also be used 
to help bridge the gap between assessment and learning. This may be achieved par- 
ticularly where assessment strategies are combined with useful feedback, and inte- 
grated within the learning process [35]. 

This feedback need not be limited to correct/incorrect responses, but can include 
detailed textual feedback about answers and the topic area of the question. Formative 
assessment can assist in consolidation of learning, and in identifying weaknesses in 
assumed understanding. We think that it would be helpful to be able to deliver the 
same questions in a number of modes. For example, help mode, exercise and exam, 
with the test author being able to configure this to their own requirements. The help 
mode supports students when they start out on their learning; accordingly, the ques- 
tions are delivered with maximum feedback including hints, visible marking on 
screen and the chance to reveal a correct answer. Exercise mode restricts the help to 
just visible ticks and crosses on screen for right and wrong responses. Finally, exam 
mode presents questions with no option for revealing answers and no ticks/crosses 
appearing. 

Our summative strategy consists of two phases: the former to find the approximate 
student level, the latter to give the student the right mark using a set of questions cus- 
tomized on his capabilities. The preliminary examination contains for every subject 
two or more questions for each difficulty level. The score obtained by the student in 
the first test is used to choose questions to propose in the second test. Using this tech- 
nique we can build a test which is not redundant (due to the adaptivity) and the same 
firs test set for every student, so we can get data on the quality of the items. Diagnos- 
tic assessment is quite similar. In particular, the two-session strategy is the same. The 
main difference is that it is taken before starting a course, to decide what kind of re- 
sources will be used. In this case, the system knows nothing about the student’s 
knowledge; it also records the scores of every answer, so the system can use them 
when it needs to explain a topic already scored. 
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When an exam session is completed, we will have a score for every candidate and 
for every question. To obtain a human-understandable mark we used a function de- 
pending on two parameters a and p. We used this function in a large number of real 
cases and the experimental data showed that the choice of a is important to obtain 
well-distributed marks. This value can be adjusted after the test correction, in re- 
sponse to the candidate’s answers. Moreover, useless items may be discovered. The 
value p is used to give full marks. 

To compose tests easily from a set of items and correct them, the system uses nor- 
malized questions and manages the item weighting: when an author creates a course, 
he sets weights that will influence the automatic item selection and the scoring algo- 
rithms. Some of the available forms of assessment strategies included in the proposed 
system are: 

• true/false, 

• multiple-response question; it is defined as a question in which the candidate is 
required to select two or more correct answers from a list of options. Both the 
number of correct answers and the number of options may vary. We consider the 
following three principle modes: i) constrained selection: the student is forced to 
make a prescribed number of selections, usually the same as the number of correct 
answers; ii) partially constrained selection: the student may make any number of 
selections up to the number of correct answers; iii) unconstrained selection: the 
candidate may make any number of selections up to the maximum number of op- 
tions, 

• extended matching item and drag and drop question types share the same process 
of selection. In either case the student is required to select a number of items from 
a list then enter or move them to their correct positions. Thus the candidate must 
make two selections - which item and where to put it. The scoring simplest form 
considers a positive score allocated for each item correctly positioned, 

• image hot spot, 

• code writing. 

The process of assessment involves gathering information from a variety of 
sources to develop a rich and meaningful understanding of student learning. Modern 
computer assisted assessment packages are capable of storing and analysing vast 
amounts of information on student learning. With appropriate analysis this data can be 
used to identify the strengths and weaknesses of individual students and match them 
to learning resources that meet their needs. 

Finding appropriate, high quality resources has now become a significant chal- 
lenge. Furthermore, based on user’s requirements and interests, filtering and retrieval 
tools should be developed, improving their usage. Information filtering systems can 
help learners by eliminating the irrelevant information, operating like mediators be- 
tween the sources of information and the learners. Personalized filtering should be 
also a process of filtering based on not only the long-term interests but also the short- 
term requirements. For these purposes, we consider relevant the integration of an 
hybrid recommender system that combine content analysis and the development of 
virtual clusters of students and of didactical sources. This information management 
system provides facilities to use the huge amount of digital information according to 
the student’s personal requirements and interests, with special focus on the develop- 
ment of new algorithms and intelligent applications for personalized information 
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classification and filtering. In this way data can be obtained about which material is 
proving to be most effective in raising student achievement. Taken together with the 
profiles of student strengths and weaknesses, this may prove an effective tool for 
identifying which resources are most suitable for each student, giving them an indi- 
vidual program of study, tailored to their needs. 

The assessment process could be organized in the following phases: 

a) Creation of the architecture for the management of the evaluation moments for the 
whole formative process: that is, the teaching interface building (through mobile de- 
vices and through fixed Web positions), the student interface building (through mo- 
bile devices as cellular telephones, PDA, Smart-phone etc.) and the administrative 
interface building, for example for the creation of authorized teachers and students. 

b) Creation of the test databases organized in atomic sets of different kind of requests 
(multiple choices, open, closed, fill in gap, building of sentences, problem-solver ...). 
Please notice as the sentences building is applicable to also very different contexts 
among them, what, to example, the program writing (building of code) and the slang 
contexts of hypothetical deductive disciplines: in these cases, in fact, we should use 
words extracted from a predefined vocabulary and verify the respect of detailed set of 
rules. 

c) effectiveness and consistency analysis of the databases produced to the previous 
point through the application of “item analysis specifications” (on real cases) 

d) Management of the various assessment processes. The distinction of the evaluation 
moment affects the management, for example, the choice of questions to be submitted 
to each student. 

e) system evaluation which allows to make experimentations on the principal plat- 
forms which at present the more diffuse PDA computers equip on the market. We 
intend to experiment the project using different student groups, for example in “Pro- 
gramming” course (Laurea Triennale in Scienze dell’Informazione, Cesena) and “Ar- 
tificial Intelligence and E-learning” (Laurea Specialistica in Informatica, Bologna). 



5 Adapting a Learning Management System to Infomobility 

As already mentioned, a rather complex information system is needed in order to 
support the experimentation of any tool or technique of m-learning,. The role of such 
system includes distributing didactic material, user identification and authorization, 
gathering of data relative to the user-system interaction, provisioning of mobile ser- 
vices etc. The objective of the project is to obtain an unified platform where the vari- 
ous actors can use different communication services, both mobile and not. In this 
regard, e-learning systems in general, and more specifically Learning Management 
System, are by now a vital component in the distance educational field. We have to 
integrate LMS with two different classes of processes: 

- on one hand, processes connected with the administrative (back-office) activity of 
a faculty (like registering exams, programming the teaching activity, theses man- 
agement, bookkeeping of the lecture hours, faculty marketing etc.: all such proc- 
esses have important overlaps with processes managed by an LMS. 
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- on the other hand, technology evolution has pushed toward a strong mobility of all 
the actors, and has furnished mobile devices (PDA, pocketPC, cell-phones, smart- 
phones, tablet-pc) that accompany the user in every day’s life. Such tools can be- 
come additional terminals for a LIS, because they allow all actors (students, teach- 
ers, secretaries, dean, tutors, administrative personnel etc.) to stay in touch with the 
LIS wherever they are. 

The number of possible applications is huge: for instance, the possibility for the 
administration to communicate in real time with students equipped with such devices, 
new forms of collaboration among students and teachers within an University course, 
the chance for the students to interact among them regarding the courses etc. The 
focus moves therefore from a system that is based on “offering courses” into a system 
based on the idea of “virtual community”. A virtual community is a highly general- 
ized collaboration space. In such way, a course given by a teacher, a seminar, the 
group of students preparing their thesis with the same teacher, students working to- 
gether on a project, etc. are all instances of virtual communities. A LIS becomes a 
computer-based tool that gives services to virtual communities, and must be adapted 
to the specific needs of each particular community. We already built, over several 
years, a community-oriented learning portal. Starting from this existing background, 
we intend to experiment various ways to support collaboration among users intercon- 
nected by mobile technologies through the already active portal based on our LIS. 

The adaptation of the Learning Information System to info-mobility will need dif- 
ferent steps: 

a) Extension of the traditional functions of a Learning management system to the 
mobile-computing needs required by the project. This will imply the creation of 
teacher-system-student interaction tools mainly based on SMS messages concern- 
ing the activities of these actors in the system. Moreover, the portal will provide an 
access point to the system’s actors, in order to download the educational material 
and the self-evaluation tests produced according to the objectives of the project. 
Besides, different structures will be created to support the research activities, like 
forums usable via mobile technologies, mailing lists for the various users, man- 
agement of some virtual communities (students enrolled in a course, participants to 
laboratories etc.). 

b) Distribution of the educational material specifically created for the fruition on 
mobile equipment. This will regard both the educational materials and the self- 
evaluation tests created in point c) 

c) Integration of the self-evaluation system into the LIS. This system will allow con- 
ducting tests on the main platforms that currently equip the most widespread PDAs 
on the market. The choice of producing self-evaluation applications for both the 
PDAs environments is because we want to extend as much as possible the experi- 
ment, and most of all we want to create a self-assessment mechanism that must be 
generalized as much as possible with respect to technological platforms, due to the 
extreme volatility of the market. 

As regard as the development of the systems, we decided on which devices to con- 
centrate our development. This is a very important issue, as the market is continu- 
ously changing with new products emerging everyday. So, it is practically impossible 
to have a general mechanism for involving all possible devices currently available. 
We found the following devices useful for our experimentations: 
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• GSM/GPRS cellular phones 

• PDA 

• Smart-phones 

• UMTS telephones 

• Tablet PCs 

The platforms have been already found in their main components. These platforms 
will be the ones based with Symbian OS on one side (this means to involve the whole 
cellular phones market with the biggest world producers), and on the other side the 
platforms equipped with Windows CE, i.e. the PDAs that present points of contact 
with the Windows desktop environment in terms of applications and working envi- 
ronment. We will also experiment with the Palm OS, so that our experiment will 
cover a very large share of the market. In the first step of the project, however, the 
choice made on some Microsoft™-dependent PDAs is related mainly on the consid- 
eration that most of the educational material is currently published in Microsoft™ 
software tools, especially PowerPoint and Word. In this sense, a device equipped with 
Microsoft™ operating systems will facilitate the interchange of educational material 
already available. However, the modular structure of the approach followed in the 
building of web services based on XML and SOAP will provide a sufficient grade of 
extensibility of our mobile platform to other PDAs, like those that are equipped with 
Symbian OS. 

The test of the system will consist in some lessons conducted using Learning ob- 
jects distributed using the LMS and used by students and teachers using PDAs, tradi- 
tional viewers (like PowerPoint and Acrobat Reader) and other available mobile de- 
vices. Part of these educational materials will be available only through mobile 
devices: students will have to learn studying only on PDAs. In this way, different 
groups that have studied on different devices with different approaches will be avail- 
able for our research: those who followed face-to-face lessons, those who studied on 
learning objects without following the lessons and those who studied on mobile de- 
vices. By creating a specific and calibrated set of tests, we want to verify the level of 
learning of the single groups, by analyzing the differences and the relative motiva- 
tions. The results of these tests will be matched with the results of the self-evaluation 
tests distributed to the students, in order to verify thoroughly the level of learning 
reached by the students. The reactions of the students will be also analyzed, especially 
those related with problems in studying with a new but limited tool like a portable 
device. For this purpose, a forum on the web will be specifically activated, and some 
tutors will be available in order to help students with practical or technical problems. 

As regarding the use of specific tools available with mobile technology, the most 
evident problem we faced in the design phase was the choice of the technology by 
which building the tools provided to the client in order to use our services. The cur- 
rent project provides ten different classes of services to mobile users, but in order to 
simplify the choice, we decided to concentrate initially on two different services for 
mobile devices: 

• The management of SMSs sent by teachers to students or by administrative staff to 

teachers and students when particular events happen (meetings, reminder for expi- 
ration dates etc.) 
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• The consultation of a common agenda (we call it organizer) that will be available 
on the mobile device and will keep all the important dates for the actor (mainly 
students and teachers) 

The first service is quite simple to build but not so easy to manage, if the LMS that 
operates behind the scenes does not have all the information needed. The main prob- 
lem has been found in allowing the right person to send and receive SMSs, and in 
granting this permission inside correct boundaries, in terms of number of SMSs 
sendable by the user. The second service is under development and is more compli- 
cated, as it involves one of the most difficult task to manage inside a LMS, i.e., time 
management. We are currently building a system that allows students and teachers to 
connect with their mobile device and consult their agenda, dynamically built with all 
the events that could happen during a normal university activity. This implies a great 
effort of abstraction and integration between the LMS platform and the mobile de- 
vices. We have evaluated five different alternatives to build the interaction between 
the PDA (the platform chosen for the experimentation) and the central database. The 
problem is related to the way the client (the PDA) interrogates the remote server 
module requesting the update of the events since last connection. These are the alter- 
natives we evaluated and tested, from the simplest to the most complicated: 

• Using the embedded browser of the PDA to navigate through the web pages that 
web users will see using the traditional browser available for desktop PCs. This is 
the simplest solution, both for the users and for the development team. Only a par- 
ticular attention to screen adaptation is necessary, in order to concentrate the most 
important information on the left-uppermost part of the screen and to avoid the ne- 
cessity of frequent scrolling. The web page will be created using device- specific 
tags and languages, like the .NET™ mobile toolkit, in order to navigate through 
the data available on the server. However, we decided not to follow this solution as 
the primary one, because of the necessity for the user to be constantly connected to 
the Internet to navigate through the organizer, thus requiring permanent connec- 
tions (like WI-FI settings) or a significant expense for the students and the teachers 
when connected to the net using GPRS technology. In Italy this solution is very 
costly at the moment, and WI-FI technology with wireless LAN is still in its in- 
fancy. Other short-range connection solutions have been abandoned, as we want 
this service to be used outside the campus. 

• Using a client database application, built specifically for mobile devices, that inter- 
rogates the server DB through the internet, synchronizing the data on the mobile 
device. This is a proprietary solution bounded to the back-end DB used and the 
availability of a Internet connection on the PDA, that requires also quite compli- 
cated settings from a end-user perspective. However, from our tests, this solution 
has the advantage of dramatically boosting performance, thus reducing connection 
times. 

• Synchronizing the PDA with the central database and the agenda of the user by 
using cradles and database synchronization: this solution will solve a lot of issues, 
but creates a problem in terms of cradle availability around the campus, and espe- 
cially the problem of supporting different cradles for different models of PDA. 

• Building a client/server application in which the client (on the PDA) uses tradi- 
tional RPC/RMI mechanisms to invoke server methods in order to receive data. 
This has the advantage of requiring short-time connection to the central system, 
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and could be personalized to the PDA device. The disadvantage of this solution is 
the proprietary mechanism of communication between server and client, and also 
the necessity of using particular TCP/IP - UDP ports that could complicate the 
management of security on the server side due to firewalls. 

• Building a web application that request a web service through the use of 
XML/SOAP messages to the server. This is the best solution we found, as it pro- 
vides the access in short time to the central database through the use of open tech- 
nology like XML/SOAP, will use a port that is already opened for web access, and 
finally will guarantee the extension of the client part to other PDAs simply by cre- 
ating the new client interface to the web service. We will therefore provide the 
agenda synchronization through a web service that will recognize the user, verify 
the state of his/her agenda, and will send an XML-formatted packet of data regard- 
ing last events in the system. The client side of the application, specific for the de- 
vice, will format this data for the display: after that, the connection with the server 
will be closed and the navigation on the agenda will be completely off-line. 



References 



1. Waycott J., An Investigation into the Use of Mobile Computing Devices as Tools for Sup- 
porting Learning and Workplace Activities , 5th Human Centred Technology Postgraduate 
Workshop (HCT-2001), Brighton, UK, September 2001, available online at 
http://www.cogs.susx.ac.uk/lab/hct/hctw2001/papers/waycott.pdf 

2. Roibas A.C., Sanchez I. A., Design scenarios for m-learning , Proceedings of the European 
Workshop on Mobile and Contextual Learning, pp. 53-56. Birmingham, UK, June 2002 

3. Rogers T., Mobile Technologies for Informal Learning - a Theoretical Review of the 
Literature, Proceedings of the European Workshop on Mobile and Contextual Learning, 
pp. 19-20, Birmingham, UK, June 2002 

4. Dvorak J. D., Burchanan K., Using Technology to Create and Enhance Collaborative 
Learning, Proc. of 14th World Conference on Educational Multimedia, Hypermedia and 
Telecommunications (ED-MEDIA 2002) , Denver, CO, USA, June 2002 

5. Kukulska-Hulme A., Cognitive, Ergonomic and Affective Aspects of PDA Use for Learn- 
ing, Proceedings of the European Workshop on Mobile and Contextual Learning, pp. 32- 
33. Birmingham, UK, June 2002 

6. Waycott J., Scanlon E., Jones A., Evaluating the Use of PDAs as Learning and Workplace 
Tools: An Activity Theory Perspective, Proceedings of the European Workshop on Mobile 
and Contextual Learning, pp. 34-35, Birmingham, UK, June 2002 

7. Keegan D., The future of learning: From eLearning to mLearning, available online at 
http://learning.ericsson.net/leonardo/thebook/book.html 

8. Trifonova, A and Ronchetti M,. Where is m-learning going!, in Proceedings of E-Learn 
2003, Phoenix, Arizona, USA November 7-11. 2003. 

9. Trifonova, A., Mobile Learning - Review of the Literature, DIT Technical Report DIT-03- 
009, available online at http://eprints.biblio.unitn.it/archive/00000359/01/009.pdf 

10. Mobile Learning Explorations at the Stanford Learning Lab: A newsletter for Stanford 
academic community. Speaking of Computers, Issue 55, January 8, 2001, available on line 
at http://acomp.stanford.edU/acpubs/SOC/Back_Issues/SOC55/#3 

11. Trifonova, A. Knapp, J., Gamper, J. Ronchetti M., Mobile ELDIT: challenges in the tran- 
sition from an a-Iearning to a m-learning system, in printing, University of Trento 

12. HandLeR project web site: http://www.eee.bham.ac.uk/handler/default.asp 




Designing Models and Services for Learning Management Systems in Mobile Settings 105 



13. Ketamo H., mLeaming for kindergarten’ s mathematics teaching, Proc. of IEEE Interna- 
tional Workshop on Wireless and Mobile Technologies in Education (WMTE 2002) , pp. 
167-170, Vaxjo, Sweden, August 2002 

14. Stone A., Briggs J., Smith C., SMS and Interactivity - Some Results from the Field, and its 
Implications on Effective Uses of Mobile Technologies in Education, Proc. of IEEE Inter- 
national Workshop on Wireless and Mobile Technologies in Education (WMTE 2002) , 
pp. 147-151, Vaxjo, Sweden, August 2002 

15. Stone A., Briggs J., ITZ GD 2 TXT - How to Use SMS Effectively in M-Leaming, Proceed- 
ings of the European Workshop on Mobile and Contextual Learning, pp. 11-14, Birming- 
ham, UK, June 2002 

16. Seppala P., Mobile learning and Mobility in Teacher Training, Proc. of IEEE International 
Workshop on Wireless and Mobile Technologies in Education (WMTE 2002) , pp. ISO- 
135, Vaxjo, Sweden, August 2002 

17. Garner I., Francis J.. Wales K., An Evaluation of the Implementation of a Short Messaging 
System (SMS) to Support Undergraduate Students, Proceedings of the European Work- 
shop on Mobile and Contextual Learning, p. 15-18, Birmingham, UK, June 2002 

18. Divitini M., Haugalokken O. K., Norevik P., Improving communication through mobile 
technologies: Which possibilities?, Proc. of IEEE International Workshop on Wireless and 
Mobile Technologies in Education (WMTE 2002) , pp. 86-90, Vaxjo, Sweden, August 
2002 

19. Sariola J., Sampson J. P., Vuorinen R., Kynaslahti H., Promoting mLearning by the Uni- 
Wap Project Within Higher Education, Proc. of International Conference on Technology 
and Education (ICTE 2001), available online at 
http://www.icte.org/T01_Library/T01_254.pdf 

20. Sariola J., What are the limits of academic teaching? - In search of the opportunities of 
mobile learning, TeleLearning 2001 Conference, Vancouver, Canada, available online at 
http://ok.helsinki.fi/tekstit/Article.rtf 

21. Sariola J., What Are the Limits of Academic Teaching? - In Search of the Opportunities of 
Mobile Learning, available online at http://ok.helsinki.fi/pdf_tiedostot/mobileEN.pdf 

22. Seppala P., Sariola J., Kynaslahti LI., Mobile Learning in Personnel Training of University 
Teachers, Proc. of IEEE International Workshop on Wireless and Mobile Technologies in 
Education (WMTE 2002) , pp. 23-30, Vaxjo, Sweden, August 2002 

23. Traxler J., Evaluating m-learning, Proceedings of the European Workshop on Mobile and 
Contextual Learning, pp. 63-64, Birmingham, UK, June 2002 

24. Collett M., Stead G., Meeting the Challenge: Producing M-Learning Materials for Young 
Adults with Numeracy and Literacy Needs, Proceedings of the European Workshop on 
Mobile and Contextual Learning, pp. 61-62, Birmingham, UK, June 2002 

25. Steinberger C., Wireless meets Wireline e-Learning, Proc. of 14th World Conference on 
Educational Multimedia, Hypermedia and Telecommunications (ED-MEDIA 2002) , 
Denver, CO, USA, June 2002 

26. Figg C., Burston J., PDA Strategies for Preservice Teacher Technology Training, Proc. of 
14th World Conference on Educational Multimedia, Hypermedia and Telecommunications 
(ED-MEDIA 2002) , Denver, CO, USA, June 2002 

27. Casadei G., Magnani M., Assessment strategies of an intelligent learning management sys- 
tem, accepted for publication in “International Conference on Simulation and Multimedia 
in Engineering Education, 2003” conference proceedings 

28. A'herran A., Integrating a course delivery platform with information, student management 
and administrative systems, in Proc. EDMedia 2001, Tampere, Finland, June 25-30 2001 

29. Hall B, Learning Management Systems. How to Choose the Right System for your Organi- 
sation, Brandon Hall, 2001 




106 Alfio Andronico et al. 



30. McMahon M., Luca J, Courseware Management Tools and Customised Web Pages: Ra- 
tionale, Comparisons and Evaluation , Proc. EDMedia 2001, Tampere, Finland, June 25- 
30 2001 

31. Hanna, D. E., Glowacki-Dudka, M. & Conceicao-Runlee, C., 147 Practical tips for teach- 
ing online groups: Essentials of Web-based education , Madison, WI: Atwood Publishing. 

32. Aggarwal, A. Web-based learning and teaching technologies: Opportunities and chal- 
lenges, Hershey, PA: Idea Group Publishing 2000. 

33. Colazzo L., Molinari A. From Learning Management Systems To Learning Information 
Systems: One Possible Evolution Of E-Learning, in Proc. Communications, Internet and 
Information Technology (CUT) Conference, St. Thomas, USA - November 18-20, 2002 

34. Huba, M.E. & Freed, J. E., Learner-centered assessment on college campuses. Shifting 
the focus from teaching to learning, Needham Heights, MA: Allyn & Bacon. 

35. Dalziel, J. R., & Gazzard, S., Beyond Traditional Use of Multiple Choice Questions: 
Teaching and Learning with WebMCQ Interactive Questions and Workgroups. Open, 
Flexible and Distance Learning: Challenges of the New Millennium, Collected papers 
from the 14th Biennial Forum of the Open and Distance Learning Association of Austra- 
lia, pp.93-96. Geelong: Deakin University. 



E-Mail on the Move: Categorization, Filtering, 
and Alerting on Mobile Devices with the ilMail Prototype 



Marco Cignini, Stefano Mizzaro, Carlo Tasso, and Andrea Virgili 



Department of Mathematics and Computer Science, 
University of Udine 

Via delle Scienze, 206 - Loc. Rizzi - Udine - 33100 Italy 
{cignini , mizzaro, tasso, virgili} @dimi . uniud . it 



Abstract. We propose an integrated approach to email categorization, filtering, 
and alerting on mobile devices. After a general introduction to the problem, we 
present the ifMail prototype, capable of: categorize incoming email messages 
into pre-defined categories; filter and rank the categorized messages according 
to their importance; and alert the user on mobile devices when important mes- 
sages are waiting to be read. The second part of the paper describes an extended 
evaluation of the ifMail prototype, whose results show the high effectiveness 
levels reached by the system. 



1 Introduction 

Information overload is the main problem for information access users: we are over- 
whelmed by too much information when we browse the Web, when we analyze the 
results of a search engine, when we use a directory, when we read the messages in a 
forum or in a newsgroup, and when we use electronic mail. Electronic mail, histori- 
cally one of the first services made available by the Internet to the large public audi- 
ence, is today one of the major activities of Internet users. All of us rely on email as 
one of the primary communication methods, both at work and at home: email has, at 
least partially, supplanted paper mail, messages, and telephone conversations. 

Email overload is an important facet of information overload: the average user re- 
ceives dozens of messages per day, and the trend is not slowing down at all [32]; 
some of us are lucky and receive a manageable number of email messages per day, 
whereas others are completely overwhelmed; unsolicited email, usually called spam 
or junk-mail, is constantly and worryingly increasing. 

Usage of email is a highly personalized activity, and people use email in amazingly 
different ways [14]. People read emails with different strategies: archivers choose 
strategies that allow them to read everything and not miss anything important, and 
prioritizers want to limit the time spent on email reading to switch to “real work” 
[11]. Accordingly to Whittaker and Sidner [33], people can be divided into no filers 
(that keep all the messages in their inbox), frequent filers (that constantly clean up 
their inbox), and spring cleaners (that clean up their inbox once every few months). 

Also, email software tools (Eudora, Outlook, Mozilla, to name just a few) are used 
not only in the standard ways foreseen by email tools designers, i.e., for reading and 
answering messages, but also in more “perverted” ways. We refer here to archiving. 
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managing a personal agenda or serving as a reminder tool: people send mail to them- 
selves as a reminder; people use the inbox message list as an agenda; people use email 
for task management and delegation; people hit reply for avoiding to type in a long 
list of addresses; people archive a whole message when the attachment is an important 
document; people use email as a file transfer mean; and so on. This creative use of 
email has generated another meaning for the “email overload” expression [32], i.e., 
the overloading of uses of this tool, and because of this phenomenon, email has been 
named a serial-killer application [10]. 

In this scenario, advanced tools for email processing are desperately needed: 
threading, categorizing, archiving, filtering, alerting, and perhaps more. Today’s 
email clients provide these functions in a rather limited way. Mail tools allow to view 
the messages sorted by date, by thread, by sender, etc. Users can manually categorize 
the messages, usually by drag-and-drop in one of a hierarchy of folders. A priority 
flag can be manually attached to a message by the sender, and shown to the receiver 
by the mail client. Filters based on pattern matching rules on (mainly) the structured 
part of messages (i.e., subject, sender, date, priority, size, etc.) can be manually de- 
fined by the user to automatically move the received messages in the appropriate 
folder (and to execute other operations on the message). Automatic anti-spam filters, 
to filter out spam exploiting some learning techniques, are common in many mail 
tools. All email tools can notify the user sitting in front of his/her desktop that new 
mail has arrived by visual and/or sound messages. 

These activities are both time consuming and rather ineffective: manually defining 
a filter and managing a set of several filters puts a higher cognitive load to a user 
engaged in other activities and, often, the decision whether a message is interesting, 
junk, belonging to a certain topical category, and so on cannot be taken only on the 
basis of the structured part of the message but it has to be taken also on the basis of 
message body, attachment, meaning, and even context (i.e., the thread to which the 
message belongs, the current situation in which the user is, and so on). Also, alerting 
is rather neglected: having only a visual and/or acoustical “You have new mail” noti- 
fication on our desktop is a rather poor way of communication, that ignores both the 
cognitive situation of the user, like his/her current task or degree of attention, and 
features of the message like its urgency, the sender, the topic, and so on. 

The coming of portable devices (cell phones, PDAs, pagers, and so on), that are 
enabled to various network connection modes (GSM, GPRS, UMTS, Wi-Fi, Blue- 
tooth, etc.), is a new and important variable to add in the above sketched scenario. 
There are several issues that need to be addressed. The new environment implies both 
limitations to be taken into account and opportunities to be exploited; therefore, sim- 
ply replicating the non-mobile approach in the mobile world would lead to far from 
ideal solutions. For instance, using a mobile device to access one own email inbox via 
standard protocols like POP or IMAP is an unsatisfying solution that neglects both the 
always-on modality of a user empowered with a mobile device, and the cost usually 
implied by data transmission on a wireless connection. The usually complex user 
interfaces of mail tools cannot be replicated on small-screen devices, so it is much 
more difficult to have ease of reading and user’s feedback (e.g., explicit feedback of 
relevance, categorization, importance, and urgency of a message is likely to be re- 
placed by more implicit kinds of feedback, perhaps exploiting the time that a message 
waits in the “unread” status). The interaction modes requiring continuous attention 
(e.g., drag-and-drop), that are common for desktop-based tools, are not adequate for 
devices used out there in the real world, with several sources of distraction. 
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Notifications could and should be delivered on the nowadays widely available 
smaller and portable devices with the most appropriate modality (WAP-push, SMS, 
etc.). Notifications should be done depending on features of the received messages 
like their number, their importance, the category they pertain to, and so on. The well 
known limitations on bandwidth, screen size, and user cognitive load (time, distrac- 
tion level, and so on) make extremely important to have a selective alerting function- 
ality, capable of notifying the user only when really important messages arrive: not 
only the notification of a spam message would be very unpleasant for the user, but 
also the notification of a “normal” message when the user is in a particular context 
(e.g., while driving, or engaged in a meeting, or in an important phone conversation) 
can be unpleasant as well. The mobile world requires an integrated solution, exploit- 
ing categorization, filtering and alerting. 

Moreover, in the mobile world, categorizing, filtering, and alerting will have an in- 
creased importance, since accessing email by a mobile device is more critical in many 
respects. People carry with themselves their mobile devices, that are therefore much 
more intrusive than a standard desktop: the “new mail” sound that might be an ac- 
ceptable interruption when sitting in front of a desktop computer, is likely to be very 
annoying while engaged in real-world critical activities. 

Turning our attention to more technical issues, we notice that new mail tools and 
protocols might be designed to allow the user (both as a sender and as a receiver) to 
specify (manually, semi-automatically, or automatically) the alerting modalities of 
certain message categories. Complex engineering solutions are needed because the 
limited storage and computational power available today on the mobile devices, and 
the bandwidth limitations, suggest a server side based solution, in which most of the 
computation takes place on the server and the data transmission on the mobile device 
is limited. 

Also, the integration of all the devices that one can use to read his/her own email 
messages (desktop PC, mobile devices, internet points, etc.) is another interesting, and 
difficult problem, and reinforces the requirement for server side based solutions. A 
further kind of integration is that among all the different kinds of messages that the 
user of a mobile device can receive: besides email-like messages, we have SMS, 
EMS, and MMS (and perhaps more in the future). The integration of all these mes- 
sage services is a difficult problem as well. 

Finally, the increased email access by mobile devices will change the people usage 
of email: nobody can predict all the range of new “perverted” or “creative” uses that 
mobile device users could imagine and adopt when mobile email tools will be broadly 
available (e.g., the sending of email to oneself as a remainder is likely to become 
much more frequent). 

All these issues constitute a research agenda for the years to come, and need to be 
tackled from an interdisciplinary standpoint: user modeling, information retrieval and 
filtering, human computer interaction, software engineering, are all disciplines that 
can contribute to the development of more effective email tools for the mobile and 
wireless world. In this paper we do not present a final and general solution. Rather, 
our aim is twofold: (i) to show how to improve and make (at least partially) automatic 
the tasks of email categorization, filtering, and alerting; and (ii) to show how to inte- 
grate these new and more effective tools in the mobile scenario, where people access 
email while on the move. The paper is structured as follows. In Section 2 we highlight 
the main issues related to email categorization and filtering. We also survey the litera- 
ture, briefly describing the relevant work that has been proposed so far. In Section 3 
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we describe the ifMail prototype, from both conceptual and technical perspectives. In 
Section 4 an extended experimental evaluation of the effectiveness of our approach is 
presented. Section 5 closes the papers and sketches future developments. 



2 Categorization and Filtering of Email Messages 

Text categorization (or classification) is the grouping of documents into predefined 
categories [28]. State-of-the-art classifiers automatically built by means of machine 
learning techniques show an effectiveness comparable to manually built classifiers. 

Email messages are very heterogeneous. Examples of variables that can range over 
rather wide set of values are: length, language(s) used, importance of the contained 
information, presence/absence of attachments of various kinds, formal/informal tone, 
emoticons, jargon. Also structured data contained in the header like date, sender, 
subject, number of recipients, are bound to wide variations. Given the peculiar nature 
of email messages, email categorization is a very particular case of general text cate- 
gorization. 

Various approaches, mainly derived from the experiments on generic text categori- 
zation, have been applied to email categorization [9]: Cohen [7] uses the RIPPER 
algorithm; Payne and Edwards [24] compare CN2 (a rule induction algorithm) with 
IBPL1 (a modified version of K-nearest Neighbor algorithm using memory based 
reasoning); Rennie [25] exploits naive Bayes classifiers; Segal and Kephart [29] de- 
velop a system for semi-automatic categorization (i.e., the system proposes to the user 
three alternative folders for each message) based on TF-IFD; Brutlag and Meek [4] 
compare Linear Support Vector Machine, TF-IDF, and Unigram Language Model, 
and obtain that no method outperforms the others; McCreath and Kay [14] show how 
the combination of hand crafted and learnt rules is more effective than either approach 
working alone. All these approaches show rather similar results, with accuracy (per- 
centage of messages classified in a correct way) around 70%-80%. An even more 
difficult problem, the clustering of email messages (i.e., given a set of email mes- 
sages, extract the categories and classify the messages in the found categories), is 
tackled in [13]. 

Spam (or junk) email filtering has seen an increasing interest in last years, due to 
the increasing amount of unsolicited emails: Pantel and Lin [19] and Saharni et al. 
[27] exploit naive Bayes classifiers; Adroutsopoulos et al. [1] use a memory-based (or 
instance-based) approach, implemented as a variant of the K-nearest neighbor (K-nn) 
algorithm; Carreras et al. [5] rely on the boosting algorithm AdaBoost to find a highly 
accurate classification rule by combining many weak rules. 

Anti-spam filtering has been approached as a separate problem from email catego- 
rization, even if, at first glance, it seems just a 2-categories categorization problem. 
However, anti-spam is an easier problem than categorization not only because it han- 
dles just two categories, but also because the two categories are rather well defined (it 
is rather easy to define spam), clear-cut (it is rather easy to sort out spam from non- 
spam), and objective (usually, what is spam for one user is spam for everybody). In 
turn, email categorization is highly subjective: each user can choose rather different 
criteria for creating the categories (e.g., some users divide messages on the basis of 
the sender, others on the basis of the topic, others on the basis of their a-priori catego- 
rization of their job activity, and so on); the number of categories can vary a lot 
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among users; the categories are sometimes not well defined (users can be very well 
organized or completely chaotic); and so on. Therefore, it is quite likely that a single 
fit-for-all email categorizer is not feasible, and that hybrid approaches are needed. 
Indeed, even if it is difficult to have a definitive comparison between the effectiveness 
of anti-spam filters and of email categorizers because of the high differences in the 
collections used, in the number and features of categories, and so on, it is evident that 
anti-spam filters effectiveness is rather higher (95% precision) than the more general 
email categorization problem. 

The alerting problem is much less studied than email categorization and filtering: 
further research in terms of notification modalities, prototype implementation and 
evaluation, and user studies is needed. It seems anyway obvious that only important 
messages should be notified on mobile devises, to avoid high cognitive loads and 
distraction on the user. Therefore, an integrated solution, comprising categorizing, 
filtering and alerting is required. 

The evaluation of the effectiveness of an email tool is not simple at all. The most 
naive approaches show several limitations. Relying on general test collection like 
TREC (http://trec.nist.gov/) is not adequate, since the peculiar nature of email makes 
an email message different from a generic document. Usenet news seem more similar, 
but again differences do exist: for instance, an email message body usually starts with 
the name of the recipient, whereas this is obviously less frequent for Usenet messages. 

Privacy is also an important issue: since email messages contain private data, few 
people are willing to make public their messages; perhaps those people will anyway 
clean some of the more compromising and confidential messages, thus making avail- 
able only a portion of their message archive, that is not a good sample at all; anyway, 
people willing to make public their email archives are not a good sample for sure, 
since people that are more reserved are completely left out; and relying on messages 
archives of mail lists leads again to a biased sample. 



3 The ifMail Prototype 

At the Udine University we have started to study some of the above described issues 
and, on the basis of our work in the last 10 years, we have developed the ifMail proto- 
type. ifMail handles, with a content based approach, categorization, filtering of email 
messages, and alerting on mobile devices. ifMail overall operation is shown in Fig. 1. 
The messages in the incoming stream are processed to extract the internal representa- 
tions used in subsequent steps. The internal representation contains term/weight 
(weight representing the importance of each term) pairs, corresponding to both the 
structured part and the body of the email message. Categorization is obtained on the 
basis of a profile attached to each user-defined folder and dynamically updated by 
means of user’s feedback. The profile contains two parts: a frame for the information 
included in the structured part of email messages, and a semantic network for the 
conceptual content of the body of messages [16]. The profile is matched with the 
internal representation of the incoming messages and the message is classified accord- 
ingly to its content. The matching takes into account both the structured and unstruc- 
tured parts of email messages. Filtering, performed by re-using the evaluation made in 
the categorization phase, singles out the most relevant messages in each folder and 
alerting takes charge of notifying these messages to the user’s mobile device. Our 
notion of filtering is therefore more general than just anti-spam filtering: ifMail tries 
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to associate to each message a numeric figure representing the importance that the 
message has for the user. 




ifMail categorization and filtering are based on the IFT (Information Filtering 
Tool) system [14,16], capable of profile building, storing, and matching. IFT has been 
developed on the basis of the UMT (User Modeling Tool) shell [0] and has been ap- 
plied to a variety of systems and domains, e.g., Web filtering [2], filtering of enter- 
prise documents [30], and filtering of scholarly publications [17]. IFT matches the 
profile associated to each category with the internal representation of each message 
and returns a result made up of three values: 

1. Coverage', the percentage of the most relevant concepts of the profile which are 
also present in the documents, computed taking into account also their weights. 

2. Match : a measure of how much the concepts of the profile are present in the 
document (i.e., they are more or less numerous in the document). 

3. Rank: a synthetic value (ranging from 0 to 5), which is obtained as a combination 
of the previous two values. 
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Categorization is performed on the basis of all three values; filtering is based on 
Rank score only. 




User 



Notification of intresting messages 




Fig. 2. ifMail overall architecture. 

Fig. 2 shows the overall architecture of the ifMail system. The main modules are: 

• WebMail, that allows the user to access email functionalities via a Web browser. It 
has been developed specifically for this project in order to connect and integrate 
categorizing, filtering, and alerting. More specifically, the WebMail module im- 
plements the only user interface of the system and it allows the configuration of the 
innovative services. 

• Mail Filtering and Classification Engine, made up by the following three sub- 
modules: 

a) Monitoring Agent, that monitors the arrival of new messages and calls the 
categorization and filtering operations. ifMail supports POP and IMAP 
servers, and any number of email accounts. 

b) Internal Representation Builder, that parses the text of message subject and 
body, removes stop words, extracts the stem of the terms, and builds the in- 
ternal representation of the message, stored in the Internal Representation 
Database. 

c) Categorization, that executes categorization and handles feedback data. This 
module contains, and relies on, the IFT submodule: IFT compares the in- 
ternal representation of the incoming message with each category profile, 
and modifies the category profile according to user’s relevance feedback. 
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• Multi Channel Alerting, that, on the basis of the categorization results and of user’s 
personalized settings, notifies immediately to the user the most relevant messages 
via a mobile device. 

Fig. 3 shows a snapshot of ifMail Web user interface: a quite standard email inter- 
face that allows standard mail management and that provides the commands and visu- 
alization items relevant to the new categorization and filtering features. The number 
of stars associated to each message is given by the Rank score associated to the mes- 
sage. 

The PDA screenshots in Fig. 4 show the multi-channel alerting of ifMail: in the 
screenshot on the top, the notification of the arrival of a new relevant message for the 
“myWork” category is shown. The user can detect (by the number of stars) the mes- 
sage relevance computed by the system, he can archive the message, read message 
data like sender and subject, or read the whole message body (screenshot below). 

The system has been developed with an XML-based technology to allow a higher 
flexibility for the presentation layer: multiple interfaces are generated by means of 
XSLT transformations, that produce the output in the markup language suitable for 
the requesting device by applying the corresponding style sheet to a common set of 
XML data. In such a way, the service is accessible by a wide range of devices such as 
PDAs, smartphones, and cell phones, provided that they comply to the WAP 1.2.1 or 
2.0 standards [19, 21, 22, 25], 

The interface design has been developed according to some guidelines for informa- 
tion access with mobile devices [5, 7, 11, 20]. The navigation through the pages of the 
service has been designed considering the physical interface used for the interaction 
with the device. Moreover, the complexity and extension of every page of the service 
are adapted to the dimension and capabilities of the display of the mobile devices. 
From a functional point of view, the interfaces are down-scaled (going from the PC 
version to the WAP version) to reduce the complexity of the service, considering the 
limited resources of the devices and the mobile context of use of the service. 



4 Experimental Evaluation 

We have discussed in Section 2 the intrinsic limitations in the evaluation of advanced 
email tools, and some of the issues that make the evaluation of these tools a difficult 
task. In order to overcome these limitations, we have designed and carried out an 
extensive evaluation of the ifMail prototype, taking also into account previous ex- 
perimental work carried out in recent years in our laboratory. The goal of the ex- 
perimental activity has been the evaluation of categorization, filtering, and alerting 
capabilities of ifMail. We have run various simulations on 6 collections of email and 
newsgroups messages (Table 1). We have used the term “simulation” since the ex- 
periments have been performed in a simulated environment in which the typical ac- 
tions that a user could perform on ifMail can be repeated at will, without engaging 
(and overloading) real users. 
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Fig. 3. ifMail user interface for Web mail. 
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Fig. 4. ifMail user interface: email reading on a PDA (left) and folders of new categorized 
messages on an Openwave WAP phone simulator (right). 



Obviously, with this approach, we have intentionally not evaluated the usability of 
the user interface, nor we wanted to claim the effectiveness of our system in absolute 
terms. On the other hand, given the early development stage of the ifMail prototype, 
we were interested in evaluating some design decisions and in harvesting an experi- 
mental set of real data with a quick, light, and formative evaluation, capable of giving 
us hints on how to proceed with the development of the system. 

Table 1 provides basic data on the six collections of email messages we have ex- 
ploited: two of them come from real users, and include all the messages received over 
a period of about 30-40 days. All the messages received over that period were in- 
cluded, and none was eliminated. Both users (one of them is the third author of this 
paper) defined a set of categories (folders), to be used for evaluating the classification 
capabilities. 

The collections extracted from newsgroups concern a similar number of messages 
and categories, with the exception of collection F, which is significantly larger and 
was considered for evaluating whether the results obtained with similar collections (A 
through E) were maintained in a much heavier situation. 
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Table 1. Email message collections used in the experiments. 



Message 

kind 


Collection 


Number of 
categories 


Total number of messages 


Personal 

messages 


A 


9 


540 


B 


7 


645 


Newsgroups 

messages 


C 


7 


525 


D 


6 


450 


E 


7 


540 


F 


16 


1309 



We have defined two different modes of operation of ifMail usage: 

• Mode One-by-one, in which ifMail provides only an advice: the user reading a 
message is shown a hint on which category(ies) are likely to be the correct destina- 
tion of that message. By confirming or not confirming on each single message the 
(automatically) proposed categorization, the user provides relevance feedback, ex- 
ploited by the system to update the relevant category profiles. 

• Mode Session, in which ifMail automatically categorizes all the messages received 
during the current day (we have assumed daily batches of fixed size including 15 
messages per day). The user provides relevance feedback only after all these cate- 
gorizations have been done. 

A first set of experiments concerned the comparison of these two modes of opera- 
tion. The profiles associated to each folder were initially empty, and were incremen- 
tally built only through relevance feedback. Table 2 illustrates the average (over all 
the available collections) of precision, recall, and FI measure [31, 34], where the 
results obtained for each category are combined using the micro-average indica- 
tor [28], 



Table 2. Comparison between session mode and one-by-one mode. 





Session mode 


One-by-one mode 


Average Precision 


75% 


79% 


Average Recall 


72% 


76% 


Average FI 


74% 


78% 



First of all, we notice that the values obtained are in the range from 70% to 80%. 
Other experiments reported in the literature [18, 28] concern the categorization of the 
Reuters-22713 collection (constituted by 21.450 articles, subdivided into 135 catego- 
ries) or the Reuters-21578 collection (constituted by 12.902 articles, subdivided into 
90 categories): the values obtained for the FI measure are in the same range between 
70% and 80%. We have considered this result as a confirmation of the adequacy of 
the baseline performance of ifMail. Furthermore, it should be highlighted that the 
values reported in Table 2 are average values, which include also the initial phases, 
where errors are most likely to happen: this implies that saturation (‘steady state’) 
values can be significantly higher. 

Secondly, it can be noticed that precision reaches higher levels than recall. We can 
interpret this phenomenon in the following way: the number of messages considered 
(i) is capable of reducing the number of categorization errors, but, on the other hand. 
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(ii) is not sufficient for building profiles that cover all the concepts included in a cate- 
gory (and some message are not categorized, i.e. not assigned to any category). Fi- 
nally, one-by-one mode outperforms session mode, reaching almost 80% in all the 
three considered indicators. 

With reference to the same experiment, Fig. 5 shows the evolution (over the se- 
quence of daily sessions and only for collection E) of the FI measure. Both modes of 
operation reach values above 80%. The 70% level (conventionally taken as the value 
indicating the termination of the initial learning phase), is reached earlier in the one- 
by-one mode. In the long run the two mode of operation reach the same level of per- 
formance. 



| FI - ’Session’ modality ♦ FI - ’One-by-one’ modality 

....... mov j n g average (Session) — ■ — moving average (One-by-one) 

“Log. (moving average (One-by-one)) Log. (moving average (Session)) 




15 45 75 105 135 165 195 225 255 285 315 345 375 405 435 465 495 525 555 585 

Categorized e-mail 

Fig. 5. Microaverage FI in both operation modes for collection E. 

Collections A and B, provided by real users, contained a Spam category, defined 
by the two users in order to collect all the ‘not desired’ messages (typically unsolic- 
ited advertising). In Fig. 6, we report the evolution over time of both precision and 
recall for the Spam folder of collection B. Precision reaches more than 95% and recall 
the range 70%-80%: this can be explained by the fact that when a Spam message is 
received, all the subsequent massages concerning the same topic will be detected, 
while new Spam topics are not known since never seen before, so they are left in the 
inbox, i.e., not categorized. This highlights a significant advantage of our content- 
based approach to Spam detection, in comparisons with standard anti-Spam systems 
based on an archive of spam messages: our system can detect any new Spam message 
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which concerns topics that previously have been already classified as Spam, inde- 
pendently from other facts (sender or subject already encountered or not). 



■ 


Precision 


■■■!■■■ moving average (Precision) 


♦ 


Recall 


moving average (Recall) 




— Log. (moving average (Precision)) 


— — Log. (moving average (Recall)) 




Categorized e-mail 

Fig. 6. Precision and Recall for the Spam category of collection B. 



Another (expected) phenomenon observed in the experimentation concerns the re- 
lationship between performance and level of specificity of a category: whenever a 
category includes a well defined and limited topic, performance in terms of precision 
and recall is higher, reaching for both indicators the level of 85%. Analogously, for 
such categories, the learning phase is shorter. 

Table 3 illustrates such a situation for some categories with this characteristics. 
Other experiments have been focused on the identification of the best threshold to 
be employed for alerting. We have seen that using only the Rank value (an integer 
ranging from 1 to 5), precision was maximized (over 80%) and that, by increasing the 
specific value considered for the threshold, precision was further improved. Fig. 7 
shows that the higher the threshold (4 or 5), the steeper is the learning curve, and 
higher are the precision values obtained (several values saturate at 100%). 

Finally, we have computed a measure of the effort required to the user of ifMail, in 
terms of the number of ‘move operations’ of a mail message towards its correct folder 
(category). More specifically, we have considered successive groups of 60 messages 
(i.e., four days), and we have counted: 

• the number of correct system categorization operations (green line on the top part 
of Fig. 8); 

• the number of user moves, i.e., the explicit indication done by the user on a single 
message, since the system was not able to categorize the message correctly (red 
line in Fig. 8). 
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It is interesting to see that, as the user ‘teaches’ to the system how to categorize, 
the system iearns’. After about 70 messages received, the user needs to move about 
50% of the messages to their correct folder. After about 300 messages, the system 
‘has learned’, and it is able to categorize correctly more than 50 messages out of the 
incoming 60, with a missed-categorization rate of less than 16%. 



Table 3. Results for categories with well defined topic. 



Collection 


Folder 


Precision 


Recall 


FI 


A 


News 


0,91 


0,83 


0,87 


B 


Students and courses 


0,94 


0,93 


0,93 


Department news 


0,85 


0,91 


0,88 


Seminars 


0,86 


0,91 


0,88 


C 


ADSL 


0,92 


0,92 


0,92 



4 moving average (Rank l> ■ moving average (Rank 2) 

moving average ( Rank 3) J moving average ( Rank 4) 

• moving average ( Rank 5) Log. < moving average ( Rank I )> 

Log. (moving average ( Rank 2)) Log. ( moving average < Rank 3)) 

Log. (moving average ( Rank 4)) Log. ( moving average < Rank 5)) 




Fig. 7. Precision with different values as alerting threshold for collection F. 



5 Conclusions and Future Work 

We have discussed the issues of email categorization, filtering, and alerting. After a 
general introduction to the problem and a brief literature survey, we have presented 
the ifMail prototype, capable of: categorize incoming email messages into pre-defined 
categories; filter and rank the categorized messages according to their importance; 
and alert the user on mobile devices when important messages are waiting to be read. 
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We have also performed an extended evaluation of the ifMail prototype. The results 
show the high effectiveness levels reached by the system. 



■ System categorization ♦ User categorization 

^“Log. (System categorization) ^^^“Log. (User categorization) 




Fig. 8. Comparison of the number of user and system categorization actions (session mode). 

We will continue this research in various ways. We are currently working at im- 
proving the ifMail prototype and we plan a more complete evaluation after these im- 
provements. We intend to deal with privacy issues with a novel approach, by imple- 
menting a software capable of analyzing the email archives of users by running on 
their computers and simulating the behavior of a categorization algorithm. The cate- 
gorization algorithm results should then be compared with the hand-made categoriza- 
tion and only the comparison results are made public. This software should be open 
source (to guarantee the privacy) and could be designed as a framework capable of 
hosting any categorization algorithm conforming to some well defined specifications. 
To take into account the time characteristics of messages (how long a message has 
been staying in the inbox, how long it has been in the unread status, for how long the 
user has not been checking his/her email, how much time the user spent in reading it, 
or in answering it, and so on) the software should also be capable of monitoring user’s 
activity for a period of time. 
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Abstract. In this paper, we describe how we support mobile access to Fischlar- 
News, a large-scale library of digitised news content, which supports browsing 
and content-based retrieval of news stories. We discuss both the desktop and 
mobile interfaces to Fischlar-News and contrast how the mobile interface im- 
plements a different interaction paradigm from the desktop interface, which is 
based on constraints of designing systems for mobile interfaces. Finally we de- 
scribe the technique for automatic news story segmentation developed for 
Fischlar-News and we chart our progress to date in developing the system. 



1 Introduction 

The growth in volume of multimedia information, the ease with which it can be pro- 
duced and distributed and the range of applications which are now using multimedia 
information is creating a demand for content-based access to this information. At the 
same time, digitised video content is becoming commonplace through the develop- 
ment of DVD movies, broadcast digital TV, and video on personal computers for both 
entertainment and educational applications. Besides the growth in volume of multime- 
dia content, we can also observe an increasing and complex range of user scenarios 
where we require content-based access to such information. Users require access when 
in a desktop environment, but also, we believe, when using wireless devices in a mo- 
bile scenario, each of which will require different access methodologies to be em- 
ployed. In this paper we discuss mobile access to a video archive of digitised news 
programs, which can be accessed using desktop devices, PDAs operating on a wireless 
LAN or XDAs on a GPRS 1 mobile phone network. In this way, and through these 
different access devices, we support mobile access to a digital video library of broad- 
cast news. Our belief being that mobile users have a demand for wireless access to 
news content. 



1 GPRS is a packet switching technology for GSM mobile phone networks. A GPRS connec- 
tion is ‘always on’ and a single user connection allows 21.4Kbps, but combining connections 
(time slots) can reach a theoretical speed of 171.2Kbps. However there are a limited number 
of time slots on a GPRS network. 

F. Crestani et at. (Eds.): Mobile and Ubiquitous Info. Access Ws 2003, LNCS 2954, pp. 124-142, 2004. 

© Springer-Verlag Berlin Heidelberg 2004 
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In addition to simply providing access to digital video archives across a wireless 
network, we are also working on new methodologies for presenting information to 
mobile users. In this paper we report on our work on developing an information re- 
trieval system (which supports mobile access) for one type of multimedia information 
(digital video), of one type of video genre (broadcast TV news) and targeted at one 
type of user information need, namely a user of Ffschlar-News who is not necessarily 
interested in viewing all the news, but wishes to be kept up-to-date with developing 
news stories of interest without being restricted to always using a desktop device (i.e. 
mobile access). 

Built on a currently existing system [1], but incorporating mobile access to daily 
news video, Ffschlar-News is based on two new and key underlying technologies: 

- Automatic news story segmentation, and; 

- Personalisation by means of news story recommendations tailored to user interests 

of individual users. 

In this paper we describe mobile access to the Ffschlar-News system and how the 
fully-automated version of the system operates. We begin in section 2 by describing 
the desktop version of Ffschlar-News (incorporating news story segmentation) which 
is built upon a news retrieval system that has been operational for last 2 years within 
the university campus. Section 3 introduces mobile access to Ffschlar-News, and dis- 
cusses the different interaction paradigm that is required for mobile access when com- 
pared to Ffschlar-News on the desktop. We also discuss how personalised presentation 
of news stories is being incorporated into the Ffschlar-News system to support mobile 
access. In section 4 we describe how Ffschlar-News actually works, and we discuss 
automatic story segmentation and how recommendation and personalisation is 
achieved. Finally in section 5, we discuss our progress to date with the development of 
the system, describe a transitional system that we used during development and fi- 
nally, we outline our future plans for Ffschlar-News. 



2 Fischlar-News Video Archive 

The Ffschlar-News Video archive is one of the results of research in analysis, brows- 
ing and searching of digital video content carried out at the Centre for Digital Video 
Processing in Dublin City University. It is one of four versions of a digital video ar- 
chive system that we maintain within the centre. Ffschlar (all four versions) is an 
MPEG-7 based digital video content management and retrieval system which supports 
digital video browsing, searching and on-demand playback using both fixed and mo- 
bile devices. The four versions of Ffschlar are Ffschlar-TV, Ffschlar-News, Ffschlar- 
TREC (2002 & 2003) and Ffschlar-Nursing. At the time of writing, Ffschlar have over 
2,500 registered users, of whom about half are “active” and regular users. 

The Ffschlar-TV system has been in operation on the university campus for over 
three years and can be accessed via a web browser on a desktop computer. Perceived 
as a web-based video recorder, registered users have been using the system to record 
and watch the TV programmes from both the university campus residences and from 
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computer labs [1]. The Fischlar-Nursing system provides access to a closed set of 
thirty-five educational video programmes on nursing, and is used by staff and students 
of the university’s nursing school, while the Fischlar-TREC systems were developed 
for our participation in the interactive search task of the annual activity at the TREC 
Video Track, in both 2002 and 2003[2], 

Fischlar-News, the focus of this paper, automatically records the thirty minute, 
9pm, main evening news programme every day from the Irish national broadcaster 
RTE1 and thus has only TV news programmes in its collection. With its web-based 
interface, the system is accessible with any conventional web browser and now is also 
accessible from mobile devices. Currently several months of recorded daily RTE1 
news is online within the Fischlar-News archive (with a total of two year’s news ar- 
chived). This archive is made available to university staff and students, and is also 
conveniently accessible from any computer lab, library or residence from within the 
campus. We have chosen the Fischlar-News application as our test-bed for providing 
mobile access to our Fischlar systems. 

In order to facilitate accessing Fischlar-News from a number of different devices 
(both desktop and mobile based), the entire Fischlar system is based on XML tech- 
nologies, which by incorporating XSL transformations for each new device required, 
can easily be extended to incorporate new access methodologies, devices and stan- 
dards. Fig. 1 shows the basic architecture of the Fischlar-News system which illus- 
trates both desktop and mobile access and the process by which automatic news story 
segmentation takes place. 




In Fischlar-News, mobile access to the news archive is supported for both PDAs 
(Compaq iPAQ on a wireless LAN) and XDAs, each of which plays RealVideo en- 
coded content, which has been encoded at 20Kbps in order to support streaming across 
a mobile phone network to an XDA. In a desktop environment, a user can use a con- 
ventional web browser (using MPEG- 1 video streaming) as shown in Fig. 1 . The in- 
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elusion of XDA support (using a GPRS connection) allows us to prototype a version 
of our Ffschlar-News system in a truly mobile environment, where access is dependent 
only on the availability of a GPRS connection. 

In realising such mobile device interaction for Ffschlar-News, two essential tech- 
nologies are required, namely the segmentation of news programs into a collection of 
news stories and a facility to automatically recommend these news stories to individ- 
ual users based on their preferences. We will discuss these aspects of the system in 
later sections of this paper. Previous versions of Ffschlar News have focussed on pro- 
viding browsing and search support at the shot level by automatically segmenting 
captured video content into its constituent shots and presenting video to the user as a 
collection of these shots. However, our current system (discussed in this paper) incor- 
porates search and retrieval of content at the news story level which we feel is more 
intuitive to a user than at the shot level because a news story is a self-contained and 
logical unit of data and is more likely to be of benefit to a user than a full news pro- 
gram or a single camera shot from a news program. 



2.1 Content Access to the Ffschlar-News Archive 

When using Ffschlar-News on a desktop device, there are a number of ways of access- 
ing news stories, described in the next sections. 



2.1.1 Browsing News by News Program 

This is the basic level of access in Ffschlar-News and is shown in Fig. 2. As can be 
seen, a listing of news programs grouped by month is displayed on screen. 
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Fig. 2. Ffschlar-News (with stories from one program) 
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Currently this list extends to include news content from April 2003. Selecting any 
news program will display a list of the news stories from that program (Fig. 2). Each 
news story is represented by a keyframe (chosen so as to contain the anchorperson and 
if possible an image in the background associated with the story) and a textual descrip- 
tion of the story. 

When presented with a listing of news stories there are two options available to the 
user, the first of these being to playback the news story by clicking on the “PLAY 
TF1IS STORY” link which will commence playback (in a new window) from this 
point onwards (Fig. 3). 




Fig. 3. Playing back a news story 

Alternatively, when presented with a listing of news stories the user may examine 
the news story at the shot level by clicking on either the keyframe or the numbered 
news story title. If this option is taken the user is presented with a detailed listing of all 
the camera shots, which have been automatically extracted from that story, as well as 
the closed caption 2 text that is associated with that story, as shown in Fig. 4. In this 
way the user can browse through the content of a given story. Clicking on any of the 
keyframes will commence playback from that point. 

However, given that the FIschlar-News archive extends to include several months 
of news programs, with an additional two years archived, and is growing daily, sup- 
porting user navigation throughout this archive of many thousands of stories is essen- 
tial. The desktop version of FIschlar-News supports a user searching through news 
stories based on textual content and browsing through the news story archive by fol- 



2 Closed caption text (or teletext) is a textual description of the spoken content of a programme 
that accompanies certain programmes when broadcast. Most programs now transmitted on 
TV now have associated closed-caption text. 
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lowing automatically generated links between news stories. We discuss both search 
and linkage now. 
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Fig. 4. Shot-level browsing of a news story 



2.1.2 Content Searching for News Stories 

Given that there are a large number of stories in the Fischlar-News system, one of our 
support measures is content-based search and retrieval of news stories. This is 
achieved by representing each news story by a textual description, which has been 
automatically extracted from the closed caption text and subsequently supporting user 
queries over the story archive. This facilitates content-based retrieval of news stories 
based on textual queries. For example, in Fig. 5, a query “house prices” has been pre- 
sented to the Fischlar-News system. 

The results of the search (166 news stories) are presented on the right side of the 
screen in decreasing order of relevance. Once again, clicking on ‘PLAY THIS 
STORY’ commences playback and clicking on the story title or keyframe takes the 
user to a shot listing. However, when a list of relevant news stories is presented to the 
user, another option exists which allows a user to view the story in the context of that 
day’s news by following the date link which displays a listing of news stories from the 
news program recorded on that date. 

The third access methodology employed in Fischlar-News is in following automati- 
cally generated links between related stories. 

2.1.3 Following Automatically Generated Links between News Stories 

Using the closed caption transcripts for a given news story it is possible not only to 
provide for text based search and retrieval of news content at the story level (as just 
described), but also to identify similar stories to any one given story, and to provide 
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the facility for content-based linkage of news stories using only the closed caption 
transcripts. Therefore, on a desktop device, for any story that a user is currently ac- 
cessing, Fischlar-News automatically generates a ranked list of story-links to the ten 
most similar news stories, which we refer to as ‘Related Stories’ (see Fig. 6 which 
shows a listing of related stories to a story about house prices). 
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Fig. 6. Illustrating related stories 
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2.2 Gathering User Feedback 

In order to provide personalisation and recommendation to users accessing the system 
using mobile devices, we gather user feedback and preferences when the user accessed 
Ffschlar-News from the desktop environment. At any point while browsing either the 
archive or a particular news story (on a desktop device), the user is presented with the 
opportunity to rate a particular story on a five point scale from “do not like” (thumbs 
down) to “like very much” (thumbs up). This can be seen in Fig. 7 where a user has 
rated a news story as being one that she likes very much (a large thumbs up). In this 
way we explicitly capture a user’s preferences for news topics that they are interested 
in. This will allow us to match individual users together based on complementary 
preferences and to recommend news stories for a user based on this collaboration 
graph. 





^ (29 May 2003) 


A survey of property prices has predicted a slump in the property market here. The study was 





(PLAY THIS STORY) 



Fig. 7. The five-point story rating scale 



In addition to this process of explicitly gathering data from a user, usage data is 
automatically gathered (on the desktop device) as the user plays back news stories or 
browses news stories. This information is then used (along with the explicitly gathered 
data) for recommending news stories to users. So, for example, if a particular user 
liked news stories on a given topic, and watched these stories, then additional news 
stories could be recommended to the user based on the viewing habits, or user ratings, 
of similar users. These recommendations are used as one of the two primary access 
mechanisms for the mobile version of the Ffschlar-News system. 



3 Mobile Access to Fischlar-News 

Small display size, awkward methods of data input and distractive environments have 
been noted as major constraints in designing systems for mobile platforms [3, 4, 5]. 
For example, a typical mobile device, the Compaq iPAQ has a 3.8" TFT screen which 
operates at a resolution of 240 x 320 (portrait orientation) in 16-bit colour. Compare 
this to a conventional desktop device, besides having larger storage and memory, 
faster processors, the supported resolution on any such device (in recent years) is at 
least 1024 x 768 (800 x 600 as a standard safe-resolution for design), with 24-bit col- 
our and a 15” diagonal display with a landscape orientation. 

In order to stream video to such mobile devices taking into account resolution is- 
sues and bandwidth (we accommodate GPRS 21.4Kbps as a minimum), the entire 
video must be downsized from the MPEG-1 (352 x 288) resolution at 25fps used for 
Ffschlar-News on the desktop to RealVideo format (156 x 128) at 30 fps. This equates 
to 13.5Kbps for the video and 6.5Kbps for the audio data. MPEG-1 streaming for the 
desktop requires about 1Mbps. 




132 Cathal Gurrin et al. 



Consequently, there have been suggestions on devising different interaction para- 
digms suitable for the mobile environment rather than simply following the conven- 
tional direct manipulation interfaces successfully used in desktop platforms [6], [7], 
[8]. More and more qualitative studies are appearing which help us better understand 
how people use and interact with mobile devices, and the kinds of context they experi- 
ence when doing so [9], [10], [11]. The general consensus is that a mobile interface 
should require a different interaction style from that of the GUI desktop interface, and 
that attempts to replicate all the functionality of desktop system into a mobile device 
are a mistake [12], [7], [6], [3]. 

Though the current literature alerts to the fact that we do not have any established 
or known methodology on which to base an interface design for a mobile platform, a 
number of rough design guidelines have been suggested based on experiences of indi- 
vidual researchers. These include the following: 

- minimise user input where applicable, provide simple user selections such as yes/no 
options, simple hyperlinking by tapping, etc. instead of asking the user to articulate 
query formulation or use visually demanding browsing that requires careful inspec- 
tion of the screen, 

- filter out information so that only a small amount of the most important information 
can be quickly and readily accessed via the mobile device (e.g. use of automatic 
recommendation as provided in the Ffschlar TV system [21]), 

- Proactively search and collect potentially useful pieces of information for a user 
and point these out, rather than trying to provide full coverage of all information via 
an elaborate searching/browsing interface. 

In terms of developing any system for a mobile device which is to support search- 
ing and information retrieval tasks, all these guidelines point to more pre-processing 
on the system’s side in order to determine what information a particular user will most 
likely want to see. This encourages the development of systems that proactively rec- 
ommend a particular piece of information (or pointers) to the user, and consequently 
demand less interaction on the user’s part. This aspect is even more important in the 
case of information retrieval from a video archive where browsing is such an impor- 
tant component of video access. What all this means is that in the development of 
search systems to be accessed from mobile platforms, the information retrieval func- 
tionality should be hidden as much from the user as is possible, and should form part 
of the data pre-processing. In supporting mobile access to the Ffschlar-News archive, 
our approach has been in line with these guidelines by incorporating the personalised 
list of news stories as the primary access point for mobile users and providing a per- 
sonalised window on these news stories based on each user’s individual preferences. 
Secondary access points include archive browsing. 

Fig. 8 illustrates the logical breakdown of news programs into stories and associ- 
ated shots based on keyframes. It is our belief that story based presentation can be 
supported using both mobile and desktop devices, however, if finer granularity of 
retrieval is required (shot level browsing with stories) then desktop devices are essen- 
tial due to interaction design methodologies for mobile devices [13] as well as the 
bandwidth limited nature of some such devices, e.g. the XDA we use to prototype our 
mobile access. 
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Fig. 8. A logical breakdown of news programs 



Given that user interaction with a mobile device should be limited to a subset of the 
functionality of the desktop version for reasons outlined in the previous section, the 
functionality of the mobile device is to support two methods of using Ffschlar-News: 

- providing personalised access to the news archive by presenting the user with a 
listing of news stories of interest to the user (Section 3.1), or; 

- supporting the user access news stories in the archive by browsing the reverse 
chronologically ordered listing of news programmes (i.e. Programme Browsing in 
section 3.2). 

3.1 Personalised Presentation of News Stories 

The primary access mechanism for the mobile device is based on personalisation of 
news stories tailored to individual user preferences. Each user’s personalised view of 
the news archive is based on similarity of program content to previously rated pro- 
grams and also to the concept of collaborative filtering. Collaborative filtering, in what 
is perhaps its most famous form, is employed by Amazon.com when making user 
recommendations based on a users previous purchases or recently viewed items. In the 
case of Ffschlar-News collaborative filtering is employed based primarily on previ- 
ously gathered user ratings of any given news stories as well as news story usage his- 
tories. We will outline our collaborative filtering mechanism in greater detail in sec- 
tion 4. 

Upon accessing Ffschlar-News using a mobile device, a user has the option of being 
presented with a personalised listing of recent news stories (see Fig. 9), that it is hoped 
will be of interest to that user, based on program content and the output of the collabo- 
rative filtering process. Each story in this list will be represented by a short description 
(similar to the desktop device), generated from the closed caption text, and a key- 
frame. The only user input that is required from a user’s perspective is to select a news 
story to playback (Fig. 10), which causes the story to be streamed in RealVideo for- 
mat. 
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Fig. 9. Personalised story recommendations Fig. 10. Playback on a mobile device 



By incorporating this personalisation aspect of Fischlar-News on a mobile device 
we are minimising user input by filtering out content that the user may not be inter- 
ested in, where this filtering is based on news story rating data and content similarity 
from the desktop device. For more complete rationale on the interaction design ap- 
proach taken and the detailed consideration for this particular interface for a PDA, 
see [13]. 



3.2 Programme Browsing 

An alternative to personalised news story presentation is provided, to enable a user to 
access news programmes regardless of their presence or absence in the personalised 
list. In this way, a user is not limited to only viewing stories that the system chooses, 
but is presented with a reverse chronological listing of recorded news programmes 
(not unlike the desktop interface in Section 2) so that the user may browse the entire 
news story archive (Fig. 11). Upon selecting a news programme, the user is presented 
with a listing of news stories from within that programme (Fig. 12), in a similar man- 
ner to the listing of personalised stories, with each news story represented by the key- 
frame and the textual description of the story. 



4 Fischlar-News, How It Works 

In realising such mobile device interaction for Fischlar-News, two essential technolo- 
gies are required, namely the segmentation of news programs into a group of news 
stories and a facility to automatically recommend these news stories to individual 
users based on their previously gathered preferences and the preferences of others. In 
the following sections we describe automatic story-based news video retrieval and the 
mechanisms we employ for automatic recommendation. 
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Fig. 11. Reverse chronological daily news Fig. 12. Story listing on a specific date 
listing on a mobile device 



4.1 Automatic Story Segmentation 

As we have stated, Ffschlar-News operates over news stories as the primary unit of 
retrieval, which is especially important in the mobile environment, but this requires a 
method of segmenting an entire news programme into a listing of its constituent news 
stories. If done manually this is a time-consuming task and if done automatically, 
which is essential for any large-scale archive of story-based news content (such as 
Ffschlar-News) is an extremely difficult task. However, given that news programs 
from one broadcaster (RTE in our case) represents a very constrained domain this 
makes automatic story segmentation somewhat easier to accomplish. For example, 
there are a lot of features (sources of evidence) that can be extracted automatically 
from the video stream to aid the segmentation process, if it is known in advance what 
to look for (i.e. in a constrained domain). We have tested and integrated into Ffschlar- 
News an automatic news story segmentation system that is based on a combination of 
a number of different sources of evidence automatically extracted from the digitised 
news video. 

There has been previous research in this area (Table 1), upon which we now report. 



Table 1. Comparison of Approaches to Automatic Segmentation of News Stories 



Evidence 

System 


Visual 


Audio 


Closed Caption 


Combination 

Method 


Informedia [14] 


Shot boundary 
detection; face 
detection; OCR; 
black frame 


Speech recogni- 
tion; silence 
detection; acous- 
tic environment 
change; signal- 
to-noise ratio 


absence of text 


Step-by-step (ad- 
hoc) 
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Table 1 . (continued) 



Evidence 

System 


Visual 


Audio 


Closed Caption 


Combination 

Method 


VISION [15] 


Shot boundary 
detection 


Audio energy- 
based shot merg- 
ing 


absence of text, 
word identifica- 
tion for topic 
distance calcula- 
tion 


Step-by-step 
(shot boundary 
detection fol- 
lowed by audio- 
based merging 
followed by 
closed caption- 
based adjust- 
ment) 


BNE & BNN 
[16] 


Black frame; 
logo detection; 
anchor booth & 
reporter scene 
detection 


Silence 


Named entity 
heuristics in 
captions, “»>”, 


Finite State 
Automation 
enhanced with 
time transitions 


Topic Browser 
[17] 


No 


No 


Morphological 

analysis 


(only Closed 
Caption used) 


ANSES [18] 


Shot boundary 
detection 


No 


Lexical chaining 


Step-by-step 
(shot boundary 
detection fol- 
lowed by Lexical 
chaining-based 
merging) 


Flschlar-NEWS 


Shot boundary 
detection; face 
detection 


No 


No 


Support Vector 
Machine 



4.1.1 Previous Research 

Many of the current studies in news story segmentation make use of multiple evi- 
dences for segmentation from visual content, audio content and closed caption text 
associated with a particular news programme. Visual evidences currently studied and 
used include shot boundaries (helping to identify possible story boundaries), blank 
frames (indicating story boundaries), anchorperson (indicating start/end of stories). 
Audio evidences studied include existence of speech/music, silence (indicating story 
boundaries) and audio energy level. Use of closed caption text, often used as the pri- 
mary source of evidence, has been more extensively studied with linguistic analysis to 
detect news story boundaries. Evidences in closed caption text includes simple clues 
such as complete absence of the closed caption (an indication of a commercial break), 
welcome phrases such as “hello and welcome” (indicating the start of the news), “back 
to you in <location>” (indicating reporter to anchorperson), etc. and manual marking 3 
“»” (indicating speaker change) or “»>” (indicating story change), as well as so- 
phisticated topic change detection by lexical chaining analysis. Using only closed 



3 Unfortunately not all closed caption broadcasts contain such manual markings. RTE1 news is 
one such example. Even if such manual markings were available, most closed caption text is 
not perfectly aligned with the video content and must be realigned in order to produce accu- 
rate story segmentation results. 
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caption analysis for news story segmentation also gives acceptable results [18]. Com- 
bining individual evidences into more reliable story segmentation is conducted in 
different ways, but most often follows sequential processing in which visual analysis 
(shot boundary detection) followed by audio analysis (merging back related shots) as 
done in [14, 15, 17], or the use of a state transition map to classify different states of 
scene changes in news programmes [16], Table 1 (above) shows a summary of analy- 
sis methods and combination methods used in six news video retrieval systems, in- 
cluding Ffschlar-News. In the Ffschlar-News system, we use an SVM (Support Vector 
Machine) to combine various evidences automatically extracted from the video con- 
tent. 

4.1.2 Automatic Story Segmentation in Fischlar-News 

For automatic news story segmentation, we analyse various visual features in the news 
programmes to automatically determine story boundaries. This involves the utilisation 
of algorithms for anchorperson detection using shot clustering [19], which detects 
when an anchor person is on screen, as well as advertisement detection [20], which 
determines when advertisements occur and face detection which detects human faces 
in the video content [21]. In addition we are considering the use of speech/music dis- 
crimination [22, 23]. 

All of the analysis techniques mentioned above for automatic story segmentation 
take place at the shot level (recall Fig. 1) and have been combined to create an auto- 
matic story segmentation system. The output from the advertisement break detection 
algorithm is used to pre-process the shots, discarding as candidates for story bounda- 
ries any shots which are part of an advertisement break. 

The combination of the other analysis outputs is being supported through the use of 
Support Vector Machines [24] and initial results suggest that this technique can effec- 
tively and efficiently combine these diverse analyses. Each shot that comprises a news 
programme is described by a feature vector made up of the outputs from the various 
analysis tools, and the Support Vector Machine is trained to classify shots into those 
which signal the start of a new story and those which do not, hence we are then able to 
detect story boundaries in a TV news programme. 

In order for an SVM to operate, it must undergo a training process. This we have 
done using a training set consisting of 435 example shots, 86 of which are positive 
examples of news story boundaries and 349 of these are negative examples. Following 
from this we tested the performance of our SVM, with very promising results, on a 
small test set of six news programmes with precision and recall figures of 1.0 and .859 
respectively. We appreciate that this test set is small and we are currently testing the 
SVM for story bound segmentation on a larger test set of news programs as part of the 
TREC Video Track 2003, which will give us a better indication of SVM performance. 
The automatic segmentation system is operational since October 2003, when it re- 
placed a temporary system which utilised manual story segmentation. 

For each automatically segmented news story, a textual description will be ex- 
tracted from the closed-caption text as well as a keyframe automatically extracted for 
each story. Our belief is that the (single) keyframe chosen to represent each news story 
should (where available) contain the anchorperson as well as a background image, 
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which represents the story. In order to automatically achieve this we will incorporate 
both temporal and anchorperson detection knowledge. 

4.2 Flschlar-News Story Recommendation and Personalisation 

Given that we have developed the Flschlar-News system with a mobile user in mind, 
the most important news stories that a mobile user requires should be presented to the 
user with the minimal user intervention or required data input. In order to facilitate 
this we have put great emphasis on supporting news story recommendation and per- 
sonalisation. In a desktop environment Flschlar-News supports these features along 
with story-based retrieval using textual queries and story linkage. However, in a mo- 
bile environment, personalisation and recommendation is a central aspect of user in- 
teraction with Flschlar-News, which helps to address some of the major constraints in 
designing systems for mobile platforms [3, 4, 5]. One highly important aspect of this 
personalisation and recommendation is Collaborative Filtering. 

4.2.1 Collaborative Filtering 

In another application, Flschlar-TV, we have been using the ClixSmart engine [25] to 
provide collaborative filtering based recommendations of TV programs for recording 
and for playback from those recorded and available in the Flschlar-TV library. The 
ClixSmart engine is a collaborative filtering system that recommends items based on 
the actions of equivalent users. For additional information on how collaborative filter- 
ing works within the Flschlar system in general see [26] . 

In Flschlar-News, personalisation is employed based on a combination of content 
similarity of news stories and collaborative filtering. As stated, Flschlar-News on a 
mobile device will filter out news stories that will not be of interest to the user based 
on past history, in addition to supporting temporal based browsing of stories from 
within the news archive. In order for collaborative filtering aspect of personalisation to 
be effective, all required data must be gathered by the system from the desktop inter- 
face. The data gathered is as follows: 

- explicit user ratings as described previously in section 2.2, 

- usage data on a per-user basis from story playback logs, 

- usage data on a per-user basis from story access logs 

This data is automatically gathered while a user uses the desktop interface and is used 
to populate a story-by-user matrix, which is used in the collaborative filtering process. 
Therefore, we can see that the mobile interface is supported and works in parallel with 
the desktop interface, by the desktop interface collecting and processing user data to 
support the personalisation process in the mobile environment. 



5 Conclusion 



In this paper we have described our efforts at supporting mobile access to a large-scale 
library of digital video news content. Our efforts have focused on incorporating story 
segmentation and personalisation into the Flschlar News system in order to support 
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access using mobile devices. This mobile access is made possible by careful develop- 
ment of the Ffschlar-News system, its interface and browsing and retrieval method- 
ologies to support the bandwidth limited, screen size limited, mobile user using mobile 
devices, such as XDAs on a GPRS mobile network. 

These mobile devices have a number of key features which limit how we interact 
with them, These include small display size, awkward methods of data input and in 
some cases (such as the XDA) limited bandwidth. Conventional wisdom suggests that 
different interaction paradigms should be devised for the mobile environment rather 
than simply following the conventional direct manipulation interfaces successfully 
used in desktop platforms. Mobile access should require, minimal user input and fil- 
tered information presentation based on background data collection so that only a 
small amount of the most important information can be quickly and readily accessed 
via the mobile device. Table 2 shows a summary of the interaction mechanisms on the 
mobile and desktop devices for Ffschlar-News. 



Table 2. Summary of Desktop v.s. Mobile Device Interaction 



Desktop Device 


Mobile De vice 


Programme Browsing 


Y 


Y 


Story Browsing 


Y 


Y 


Shot Browsing 


Y 


N 


Text Querying 


Y 


N 


Related Stories 


Y 


N 


Personalisation 


N 


Y 



In Ffschlar-News, the mobile interface is supported and works in parallel with the 
desktop interface, in that the background data collection to support personalisation and 
recommendation is mined from observing user activities in a desktop environment and 
used to support personalisation in the mobile environment. 

5.1 Our Progress to Date 

In realising the underlying technology required for story-based and recommendation- 
based mobile access to the news story archive, we built and for a number of months, 
used a manually segmented version of Ffschlar-News to kick-start the fully-automated 
system that is in operation since October 2003. From April 2003 until October 2003, 
recorded daily news programmes were manually segmented into stories (in XML 
format) to generate an initial library of news stories. The manually marked XML files 
are uploaded into the system, which then incorporates them into the archive. Manual 
segmentation is a time consuming process in which each news story identified from a 
given programme is represented in XML format by the following information; a start- 
time and end-time, a representative keyframe and representative text to describe the 
story, which had been extracted from the closed captions. 

This initial manual segmentation just described served to start collection of initial 
user ratings of news stories required for collaborative filtering while the automatic 
segmentation mechanism was being prototyped. 
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Collaborative filtering is only of benefit if users of the system access or watch news 
stories (mined from user logs) and/or rate news stories using the thumbs-up and - 
down indication (as discussed in section 2.2). In order to collect data to support the 
collaborative filtering process, a core group of regular users of the Fischlar-News 
system have been encouraged to rate news stories since the end of April, 2003. To date 
(mid-October 2003) we have received over 22,000 individual story recommendations 
from these users. The reason for doing this was that in October 2003, when the fully 
automated Fischlar-News system went live, it was immediately able to generate rec- 
ommendations of stories for all users based on collaborative filtering using the judge- 
ments of this core group of users. 

5.2 Future Plans 

Given that the Fischlar-News system outlined in this paper is a live system based on 
research being carried out within the Centre for Digital Video Processing, it will be 
subject to modification and improvement. Our future plans include identifying what 
other functionality (from the desktop) can be included in the mobile version that fits in 
with the design guidelines for mobile devices. 

Currently a daily reminder email is sent out to each user of the Fischlar-News sys- 
tem reminding them that the latest news programme has been processed and available 
for browsing and searching. This email is currently identical for all users, however, the 
facility exists for us to tailor or personalise each daily reminder email based on the 
users previous preferences for news story content. 

Finally, it is possible using SVMs to incorporate additional sources of evidence into 
the automatic segmentation process if this is deemed necessary. The results of our 
larger test of the performance of the SVM will dictate whether this is required. 
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Abstract. This paper reports on research into and development of portable 
hardware that will enable users in the field to send images, and associated posi- 
tional data from a PDA to a server for processing. The central aim is to provide 
navigational and informational services to an urban mobile user based on build- 
ing recognition. The paper begins by describing the hardware before presenting 
research into server-side building recognition methods that operate by compar- 
ing user-supplied images with images generated by an existing 3d virtual 
model. 



1 Introduction 

Recent advances in the development of personal digital assistants (PDAs) and wire- 
less communication networks enable a new generation of sophisticated mobile appli- 
cations. PDAs can now support a range of add-on devices, such as digital cameras, 
and communicate using a variety of networking protocols, such as GPS, WiFi, and 
Bluetooth. With the increased availability and advanced features of these low-cost, 
portable and mobile system devices, there is a potential to develop a wide range of 
applications [9, 2], The combination of mobile computational, imaging and position- 
ing capabilities and network access opens the door to a variety of novel applications, 
such as pedestrian navigation aids, mobile information systems and other applications 
usually referred to as ‘location services’ [3]. [3] improves the GPS accuracy using 
GPS data, orientation data, image, and Hough Transforms. The hardware system in 
[3] is quite similar with ours except that their system is very heavy and uses careful 
calibration. However, this research seeks to exploit the capabilities of the Personal 
Digital Assistance (PDA) and the imaging functions of a PDA camera for building 
recognition. 

There have been several digital city projects [1, 6, 7, 8, 14] concerned with city- 
scape and city models in the past decade. And most of them are designed to be an 
integrated information and service environment for everyday life and tourism [1, 6, 7, 
8], Some of them also put much effort into the 3D model for city promotion applica- 
tions [16], etc. This project also requires a 3D virtual model for reference. However 
the application of this system is to help tourists identify buildings on the road and also 
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provide immediate information in real-time. Image processing for object recognition 
is one of the main aspects of this project. 

Visitors to a city sometimes find problems in understanding maps or guidebooks, 
even guidebooks with symbols. In another project, surveys of pedestrians found that a 
significant proportion (12% of males, 24% of females) had difficulty in locating 
themselves on a printed map [12]. However, the system described here helps visitors 
to identify their locations and get information about urban objects using the object 
images from a portable commercial PDA. 

Unlike kiosks, or other fixed information stands, this system is much more flexible 
and dynamic. People can stop at any interesting object, take a picture of it and send 
the image with the corresponding GPS and orientation sensor data from the devices 
equipped with the PDA system to the web server. The server can identify the object 
using an online images generated from a 3d city model. If the building is found to be 
the same as the one in the city model, the server can provide the user with relevant 
information. This system also allows PDA users to save their pictures in a public 
database on the server temporarily and download them after they have returned home. 
This solves the problem of the limited size of the memory card in portable devices. 

To detect objects reliably, a model of the object is needed. Thus, one of the key 
components of our approach is a 3d city model. The system described in this paper 
uses a relatively small 3d model of the square in front of the University as a reference 
study. The model was constructed from a manual survey and the texture maps were 
derived from photographs. At the same time, a Geographical Information System 
(GIS) has been used to provide location information and to enable the transformation 
of the coordination between the GPS and city model components. 

The overall plan for this project is described below. Users are equipped with some 
hardware devices to obtain different data, e.g. the GPS data, orientation data and the 
image. The use of images removes some of the problems of low GPS and orientation 
data accuracy in urban areas. Then users need to send the data to server for further 
processing. On the server side, this project is designed to do image processing for 
building recognition. If the image is confirmed to be part of the model, the image has 
been identified. In this way, users are able to obtain the information on their location 
and also the objects they are interested in. Position from the GPS receiver will never 
change if a user reports from the same location. However the building located there 
may possibly be changed with time. This system not only provides a way for tourists 
to travel around the city, especially as this system is very handy and low cost, but also 
a possible way to keep the model updated. A detailed flow chart for the user operation 
sequence is shown in section 2.4. 

Most of the previous research in this area has been concerned with the location- 
based services. This paper presents a ‘location and image-based service’, which deliv- 
ers information about a specific building of interest in real-time to a mobile user 
through the Internet by identifying the building from an image supplied by the user. 
To realize the image-based service, requires not only the location data (GPS data) of 
the user but also the direction and tilt data (from the orientation sensor attached on the 
PDA). 

Section 2 describes the system design, including each component of the hardware 
devices, software application and networks. Methods for object recognition are de- 
scribed in section 3 and some results of this project are presented in section 4. Sec- 
tion 5 summarises the main points of the paper and discusses further research. 
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2 System Design 

The system consists of three main parts: the client side, server side and the connecting 
networks. Fig. 1 shows the relationships between these different parts. 



Server Client 




3d model 



Fig. 1 . System components diagram. 



2.1 Client Side 

The client side is a portable PDA system. The system includes an iPAQ 3870, Nexi- 
Cam PDA camera with resolution 600x800, orientation sensor and GPS receiver. 
Because the PDA development was still at an early stage when this project started, 
PDA is not able to provide enough interfaces for all the devices we wish to use-for 
example, the iPAQ 3870 provides one expansion connector for its expansion pack and 
one universal connector, which can be converted to a serial port and is integrated with 
Bluetooth and GPRS. But the USB interface sensor is not able to connect to the 
Pocket PC. So in this project, a laptop is used to receive the data from the USB port 
and Bluetooth supports the communication between the PDA and laptop. However, 
this problem should be resolved in the near future by the next generation of PDAs. 
The GPS receiver and camera are connected to the PDA respectively using the uni- 
versal connector and expansion connector. A WLAN card is plugged in the camera, 
which allows the PDA to access Internet through WLAN. 

2.2 Server Side 
3D City Model 

The selected city model is being built from site surveys based on the GIS map, con- 
ducted using Cyrax2500 laser scanner and 3DMax software. Digital images will be 
used to record details and texture of the buildings. Images from the PDA users will 
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keep this database up-to-date. A 3d model is constructed from the point clouds pro- 
vided by the scanner and is imported into 3DMax for visualization. With this 3d 
model, it is possible for us to generate images from any position and direction, such as 
those returned from the PDA client. This model image provides a reference for build- 
ing recognition. One part of the generation of the city model image will be imple- 
mented by 3DMax. Figure 2 shows the building model of the Lanyon Building of 
Queen’s University Belfast, which has been implemented as a pilot study. On the top 
right corner is the script window (with words inside). This script sets the camera to 
the correct position and adjusts the camera angle based on the incoming GPS and 
orientation data from the user. Camera is the white object in the left top window. 
Right bottom window is to render the reference image. 




Fig. 2. City model in 3DMax environment. 



GIS System 

With the GPS data from the receiver, users are able to identify their locations in the 
GIS. The GIS is also used to link the building components to a GPS position. For 
example, in the virtual model, the origin is at the right front corner of the building, the 
4-point star in the top left window in Figure 2. The GIS system converts the 3DMax 
co-ordination to GPS co-ordination. 

Applications 

Some applications are available on the server. 

The first application is to identify the buildings from user-supplied images and 
provide information in real-time, e.g. transportation, accommodation, history and 
events. Matlab is used for the image processing and some building recognition meth- 
ods, like line detection and segmentation are applied. 




A PDA-Based System for Recognizing Buildings front User-Supplied Images 147 



The second application is the ‘public space’. This space is for users to save their 
travelling pictures temporarily. As the size of memory cards for PDA cameras is lim- 
ited, it will be helpful if the server can provide this public space for users, who can 
later download the pictures to a desktop computer. Normal security will be provided. 

The final application is the position display. With the GPS data from the PDA user, 
server is able to display his location in a 2D GIS map in real-time. 



2.3 Networking 
Bluetooth 

A Bluetooth SDK from WIDCOMM has been used to develop Bluetooth applications 
for communication between the PDA and laptop. This program provides three func- 
tions. The first is to synchronize the time between PDA and laptop. The second is to 
transfer the GPS data from the PDA to the laptop. And the last one is to instruct the 
laptop when a picture is taken and send the required data to the server through the 
Internet. 

Before any communication, this Bluetooth application must synchronize the time 
between the two devices, as all the data are time-ordered. 

GPS data are received and transferred to the laptop from PDA through Bluetooth 
every second. Sensor data are also received every second and saved in the laptop 
together with the GPS data with the same received time. To summarise, in the file, for 
each second, there is a pair of GPS and orientation data. When the user takes a pic- 
ture and sends it to the server for processing, system also automatically send the cor- 
responding GPS and orientation data to the server. “Corresponding” means the data 
received time is the same as the picture taken time, 

WLAN 

Several WLAN access points have been set up in the city so that PDA user can access 
Internet through WLAN with higher network quality and greater speed in that area, 
which is always crowded and with bad network. 

GPRS 

GPRS (General Packet Radio Service) is integrated with the iPAQ 3870. This is the 
normal way to access the Internet when WLAN is not available. 



2.4 Flowchart of User Operation Sequence 

Before user takes pictures for recognition, he must first start a custom-built applica- 
tion on the PDA. This program is to receive the GPS and orientation data from the 
devices, to transfer data between the laptop and PDA in Bluetooth, to detect if any 
picture is taken and also to request the corresponding GPS and orientation data (with 
PDA system time corresponding with the picture) from the laptop and send them to 
the server together with the image. 

User then starts the camera application (this application is from the camera manu- 
facturer and different from the custom-built application) on PDA. When the user 
presses the button to take any picture, background program (custom-built application) 
records this PDA system time for this action After he is satisfied and decides to use 
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the picture for building recognition, background program will request the correspond- 
ing GPS and orientation data with corresponding PDA system time from the laptop 
and send them to the server together with the picture. When the data are received by 
the server, server first runs 3DMax to generate a reference image from the same posi- 
tion and angle (based on the incoming GPS and orientation data from the user) in the 
3d model online (as showed in Figure 5). And then attempts to match the user PDA 
picture with the reference image. If these two pictures are identified to be the same 
building, server will provide information on the object and location. Otherwise, a 
notice “object unidentified” will be sent to PDA user. But user location in a 2D map is 
still available for the user (operation is showed as a flowchart in Figure 3). 



PDA user Server 




Fig. 3. Flowchart for user. 



3 Object Recognition Methods 

There are two building recognition methods applied in this project, line detection and 
colour based segmentation. 
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3.1 Hough Transform 

The first method for object recognition relies on line detection. Several methods of 
line detection have been developed in the past decade. Hough Transform (HT) [15, 
16, 11] and Radon Transform (RT) [13] are the two most important among them. 
These two methods can transform two-dimensional images with lines (original coor- 
dinate plane) into a domain (Hough space) of possible line parameters, in which each 
line in the image will produce a peak positioned at the corresponding line parameters. 
In the original coordinate space (image coordination), lines are represented using the 
form y = ax +h. However, in the Hough space (parameter coordination), lines are 
described in other forms. The most popular form expresses lines among them is in the 
form rho = x*cos(theta) +y*sin(theta) [4], where theta is the angle and rho the small- 
est distance to the origin of the coordinate system, also known as a polar coordinate 
system. In the image space, a line is made up of dots. However, a dot is displayed as a 
sine wave in the parameter space. The intersection of different sine waves represents 
the line, which is made of all these points, as shown in Figure 4. The intersection with 
more waves going through means there are more points located on this line. We call 
this intersection a “peak”. After sampling the image, we are able to find peaks in the 
parameter coordination that represent the main lines in the image. 



In this experiment, RT is applied with some modifications. 

First, the sample angle theta was set to sample more points in the vertical and hori- 
zontal areas and fewer in the other directions, as lines in buildings tend to be found in 
these areas. 

Another is to do some pre-processing and post-processing on the detected lines. 
Pre-processing includes different filtering and colour space conversions. The method 
for post-processing is that we set the difference between the lines parameters we 
found in the image must not be within the areas we defined, e.g. the difference be- 
tween two line angles must be more than 10° and rho to be more than 30 pixels. If 
parameters are within these areas, it means these two lines are very close and they are 
probably the same lines (see Figure 7 (b)). 

3.2 Segmentation 

The segmentation of images based on colour and texture cues is formulated as a clus- 
tering problem. Small image pixels are grouped together on the basis of local colour 




x ^ 

Fig. 4. Hough/Radon Transform. 
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space statistics, which is captured by Gaussian models. Clustering is one of the fun- 
damental methods for image segmentation [5, 10], It is implemented in the following 
steps [5]: 

1. Data representation: The data types represent the objects in the best way to stress 
relations between the objects, e.g. similarity. 

There are three major types of data representation, vectorial data, distributional 
data and proximity data. In this project, data are represented in distributional data, 
which are described by an empirical probability distribution or histogram 

over features. x e F . 

2. Modelling: How to formally characterize interesting and relevant cluster structures 
in data sets. 

The goal of modelling is to assign objects with similar properties to the same clus- 
ters and dissimilar objects to different clusters. For histogram data, distribution 
clustering objects are grouped according to the similarity of their histograms 

P {t|o /with a cluster specific prototypical distribution of features which is pa- 

rameterised by 6 a . The natural distortion measure between two histograms is de- 
fined by the Kullback-Leibler divergence, 

3. Optimisation: How to efficiently search for cluster structures. 

K-Means clustering is applied. It is a least squares partitioning method that divide 
a collection of objects into K groups. 

4. Validation: how to validate selected or learned structure. 

4 Results 

4.1 City Model Image 

The model image (Figure 5) is treated as a reference. So, generating the correct image 
is very important to this project. Figure 5 shows the image rendered by a 3DMax 
script based on the user GPS and orientation data. This reference image will be at- 
tempted to match the user image (Figure 6). 



4.2 Radon Transform 

Figure 7(a) and (b) show the lines found before and after the post-processing, which 
was mentioned in Section 3. Figure 7(a) contains more errors in vertical line detection 
(inside the oval in long dash). This is because around those edge areas, the dots are 
very dense and noisy (caused by the pattern in the real image), and the computer will 
misinterpret a single line as several lines. The parameters ( rho and theta ) of these 
lines are very close to each other. Based on this knowledge, we can apply some post- 
processing to eliminate this error. After the modification, lines in Figure 7(b) are more 
reasonable. Figure 8(b) shows the lines from the model image after modification. You 
can see some different between Figure 7(b) and Figure 8(b). This is caused by the 
difference on the pattern and also the accuracy of the orientation and GPS data. How- 
ever, as we will modify the GPS to DGPS (Differential GPS) data in the near future, 
this result will be improved afterwards. 
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Fig. 5. Reference Image from the 3D model. 




Fig. 6. User image from client PDA camera (600x800 resolution). 

Table 1 displays the parameters for the user image before and after post-processing 
and the reference image (from the 3D model) after modification. In this table, each 
pair of rho and theta defines a peak in Hough space (shown as the stars in Figure 
8(a)). Figure 8(a) represents the peaks in figure 8(b). In image space, each of these 
peaks represents a line as Figure 8 (b). 

The line parameters for Figure 7(b) and Figure 8(b) do not perfectly match. How- 
ever, if we allow a small difference between the line parameters, e.g. A theta <= 3° and 
A rho <= 50 pixels (actually this is a small error), then Figure 7(b) and Figure 8(b) 
have 6 lines matched. In Figure 7(b), Line 02, Line 03, Line 04, Line 07, Line 08 and 
Line 10 are respectively matched Line 04, Line 01, Line 06, Line 03, Line 07 and 
Line 10 from Figure 8(b). This method will be used in conjunction with the segmenta- 
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tion approach and modelled data will be compared with sampled images in order to 
determine the usefulness of each method. 

Table 1. Line parameters. 





Lines in Figure 7(a) 


Lines in Figure 7(b) 


Lines in Figure 8(b) 
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theta 
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Line 02 


67 


95 


67 


95 


.-244 


84 
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Fig. 7. Lines detected in the user images using the Radon Transform (a) before post-processing 
and (b) after post-processing. 
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Fig. 8. Lines detected using the Radon Transform (a) shows the lines of (b) in Hough space, 
and (b) are the lines in Figure 5 after postprocessing. 



4.3 Segmentation 

This experiment is to apply the segmentation method to each picture and find the 
centre for each group. The results show that the two different pictures for the same 
building have close centres while the centres for a different building are more differ- 
ent (showed in Figure 10). 

In order to arrive at an initial estimate of the underlying Gaussian alphabet for the 
later stages of clustering, a conventional Gaussian mixture model estimation step is 
carried out with the colour values of the image pixels as input data (this experiment 
use Hue and Intensity). Each input data (the mean and standard deviation of a Gaus- 
sian model) is generated from a block of 6x8 image pixels. In other words, the total 
input data is 100x100 (as the image is 600x800). The following work is to cluster this 
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10,000 data set into 3 groups according by Hue or Intensity values. In this case, K- 
Means is applied to do the clustering. Results of the segmentation are showed in Fig- 
ure 9. 





(b) 

Fig.9. Segmentation for 3 different pictures. 



Hue 



Picture 9 



Picture 4 



Picture 5 



Figure 9 (a) shows the 3 PDA camera pictures (with resolution 600 x 800) taken by 
different people in different time. The problem with different users, taking different 
views, but expecting the same result, is clearly evident. This also allows the robust- 
ness of the building identification to be tested. Later versions of the system will direct 
the user to take further pictures or assist them in taking better pictures. Figure 9 (b) 
shows the segmentation based on the Hue and Intensity value, where the image has 
been processed so that each pixel in hue or intensity belongs to one of three groups. 
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Table 2 shows the average and standard deviation of each of these groups. For exam- 
ple, based on Hue, there are 3 centres (for 3 groups) for Picture 4 [0.1077 0.0046], 
[0.3444 0.0527 and [0.6160 0.0149], In the first centre [0.1077 0.0046], 0.1077 is the 
mean value of this group Gaussian model and 0.0046 is the standard deviation. These 
results are all plotted in Figure 10. The centres for the pictures in Figure 9(a) are from 
two different red-brick buildings, but even here, it can be seen that the hue and inten- 
sity values of the building show significant differences from the other. Work is pres- 
ently developing appropriate limits for the segmentation sets. 



Table 2. Centre for each group in segmentation. 





3 group centres based on Hue 


3 group centres based on Intensity 


Picture 4 


Centres-hue = 
0.1077 0.0046 
0.3444 0.0527 
0.6160 0.0149 


Centres-intensity = 
0.8831 0.0030 
0.5934 0.0075 
0.3293 0.0053 


Picture 5 


Centres-hue = 
0.1286 0.0056 
0.3282 0.0483 
0.5537 0.0123 


Centres-intensity = 
0.9162 0.0083 
0.3261 0.0083 
0.6748 0.0124 


Picture 9 


Centres-hue = 
0.1353 0.0065 
0.3386 0.0403 
0.5901 0.0209 


Centres-intensity = 
0.2836 0.0137 
0.5516 0.0233 
0.8409 0.0105 



In Figure 10, the cross, dot, and star respectively represent the centres for the hue 
and intensity values in picture 4, picture 5 and picture 9. The X and Y value of the 
centre also stand for the mean and standard deviation of the Gaussian model of each 
group. In other words, these centres are in some way representing the features of the 
image. 




(a) (b) 

Fig. 10. Centres for each group (a) shows the centres from Hue segmentation and (b) shows the 
centres from Intensity segmentation. 
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5 Conclusion 

In this paper, a system to help people acquire urban information, including the build- 
ing and geographical information is presented. It is integrated with different hardware 
devices, software applications and networks. 

With this system, the city model is not only for the use of city promotion, indoor, 
environment planning or architectural design, but it also offers a useful database for 
tourists and travelers. We believe our system provides a good demonstration of a 
PDA application and is especially useful for tourists for its mobility. Its main contri- 
bution is that people can travel around without having to refer to maps and guide- 
books. The city model is fully used to provide information to people at any time and 
anywhere, in contrast to fixed kiosks and indoor presentations. Some public space on 
the server is available for user to keep their pictures temporarily to overcome the 
limitation of the memory card. Two methods for object recognition have been de- 
scribed and improvements have been discussed. 

As PDA is still not a complete system for this research, there have been difficulties 
in working around the limitations of the device, e.g. as described in section 2.1, PDA 
does not provide enough interfaces. 

The battery is not able to work for a long time. Since this application is especially 
designed for tourists. It is quite important for the PDA to work for a long time while it 
is not possible to charge it all the time. 

Further works will include: 

• Continue the custom-built program, e.g. the management between orientation data 
and GPS data and how to catch the “take picture” action 

• Applying segmentation for building recognition 

• Automating of the whole system 
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Abstract. Handheld devices, like PDAs and mobile phones, are increasingly 
used to access information on the Web. However, because most Web pages are 
designed for desktop PC screens whereas these devices have small screens, only 
a small region of a page is visible at a time. Reading and finding information is 
therefore difficult and requires extensive amount of scrolling, both horizontally 
and vertically. One way to address this problem is an overview plus detail ap- 
proach to the design. We describe SearchMobil, a working system that supports 
the user in viewing and searching the Web with a PDA. It is based on our 
SmartView technology that performs content and layout analysis of Web pages 
and thus provides foundations for SearchMobil features. We present the results 
of a user study that shows the utility of the SearchMobil approach and provides 
further insight in challenges and opportunities that the mobile Web presents. 



1 Introduction 

With the proliferation of Internet capable mobile devices, such as Personal Digital 
Assistants (PDAs) and Web enabled mobile phones, deficiencies of current Web pub- 
lishing practices have become apparent. Mobile devices have small screens; yet most 
Web pages are designed on the assumption that they will be viewed from a standard 
desktop screen. Those with complex layout require a certain minimal screen space 
which mobile devices cannot provide. Such pages are therefore difficult to view on a 
mobile device without extensive scrolling, both horizontally and vertically. 

This problem could be alleviated by a document format that allows authors to de- 
scribe conventional layout features, such as multiple columns, sidebars, menus, etc., in 
an abstract and flexible way, and that can degrade gracefully when features cannot be 
represented reasonably on a given screen size. When the screen size is incompatible 
with the layout it should preserve as much of the designer’s intent as the device can 
accommodate. Such a document format, however, does not exist yet, and HTML as 
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the Web's main document format doesn’t provide any of the mentioned features. As a 
consequence, Web designers usually hard code their layout intentions using HTML 
tables with fixed column widths and small blank images for spacing, effectively turn- 
ing HTML into a layout description language. This results in rigid, inflexible, fixed- 
size Web page layouts that cannot be re-flowed to preserve the logical structure when 
viewed on smaller screens. 
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Fig. 1 . A Web page with a complex design, as seen on a Pocket PC. The original page width is 
three times the width of the Pocket PC display 



Currently, far too little published material on the Web is suitable for mobile de- 
vices, despite the fact that most of today’s mobile phones have Internet capabilities. 
This material includes pages with simple layout that display well on any screen, or 
material specifically targeted to small devices, for example, by use of the Wireless 
Application Protocol (WAP). The vast amount of standard unmodified Web contents, 
however, remains effectively inaccessible to mobile devices. Thus, a number of at- 
tempts have been made to improve upon this situation. 

Some of them focus on eliminating the negative effects of (horizontal) scrolling. 
Breaking a page into discrete sections, i.e., “sub-pages” that can be accessed by click- 
ing ‘Previous’ and ‘Next’ links or buttons, provides a decent alternative in some cases 
([4], [9], [12]). Wrapping the text to fit the width of the screen, or “linearizing” the 
two-dimensional layout of the page into a vertical axis, completely eliminates the need 
for horizontal scrolling. However, the results are long - sometimes very long - docu- 
ments. 

A technique particularly useful for mobile phones involves the suppression or 
elimination of original data. Instead of the full content of the page, the user is pre- 
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sented with key sentences or navigation elements, such as links, which serve as a con- 
tent summary ([1], [2]). The interface allows the user to request additional details 
about a particular summary element. This approach requires robust summarization 
techniques if it is to be applied generally. More than often, though, quality summaries 
require assistance by human editors. 

A further concern is the issue of consistency in the search and browsing experience 
across devices. The underlying premise is that the same Took and feel’ of the Web 
content across devices is desired by both authors and users ([3], [5], [6], [7], [ 14]). The 
task, therefore, is first to design representations of the content that connect the user 
with the familiar content presentation, as experienced in the desktop environment. 
Because of the space constraint this is likely to be an overview rather than a directly 
readable and consumable format. The second challenge is to devise interaction tech- 
niques that enable quick and effective access to the detailed view of the relevant con- 
tent. 

SmartView [6], [7], the Web page analysis technique developed in [3], and Web- 
Thumb [14] all provide Web page overviews in the form of static or zoomable thumb- 
nails. Interaction with the graphical overview ranges from simple tapping on a specific 
region of the graphical overview to more sophisticated WebThumb interactions that 
include ‘picking’, ‘zooming’ and ‘panning’. These are non-standard for current PDA 
browsers but likely features of future versions. A similar overview plus detail ap- 
proach is exploited in SearchMobil [10], an application built on SmartView technol- 
ogy, that supports the user in a variety of search situations: from Web search, facili- 
tated by on-line search engines, to search focused on pages seen by the user, to a 
simple, within-page ‘find’ function that helps locate relevant portions of the text 
quickly and effortlessly. 

It is this aspect, the consistent experience across devices that is of interest here. We 
shall describe in more detail the SmartView technology and its application within 
SearchMobil. Usability studies of the two provide valuable insight into benefits and 
drawbacks of this particular overview plus detail approach and shed more light on the 
general issues of Web access on small devices. 



2 SmartView Technology 

The SmartView approach recognizes the importance of the content’s intended layout, 
as specified by the author, and the fact that Web pages typically consist of a number of 
coherent logical units of the content. Whereas in the HTML implementation these 
units are not explicitly marked, we discover them by analyzing the structure of the 
page layout. We can then allow the user to select any of these units and view it inde- 
pendently from the rest of the document. These portions are usually simple, non- 
structured HTML fragments, and can be re-flowed easily to accommodate the nar- 
rower screens of mobile devices. The page thumbnail overview is displayed with su- 
perimposed outlines indicating the segments (fig. 2, left). The user can navigate to a 
specific detail region by tapping on one of the outlined areas with the stylus (fig. 2, 
right). 
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The user can quickly switch back and forth between the thumbnail overview and 
the detailed view. While the browser is in the SmartView mode, subsequent access to 
pages is facilitated by corresponding thumbnail overviews, created as the user exe- 
cutes the links. 
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Fig. 2. Web page thumbnail providing an overview and indicating the logical segments (left). 
Detailed view of a selected segment (right), displayed for optimal viewing and reading 



2.1 Page Analysis and Decomposition 

SmartView relies only upon geometric properties of the HTML page and is, therefore, 
language independent. It identifies geometric characteristics of page elements by 
downloading the page content, including all images, and rendering it to a standard 
width, suitable for viewing on a desktop computer (e.g., 800 pixels wide). From this 
layout, we create a thumbnail image, sized to fit the screen of the handheld device. 
The corresponding document object model (DOM) allows us to access and inspect 
individual HTML elements of the page, such as tables, cells, and forms. We recur- 
sively traverse the HTML DOM and consider the sizes and arrangements of these 
elements. Based on simple heuristics about their width and height, we decide whether 
a table or a cell should be marked as a “logical section” or whether we should continue 
the process of subdividing or merging individual elements. 

The result of this analysis is a vector of nodes that correspond to tables, cells within 
tables, rows, and similar elements from the DOM, which are identified as logical sec- 
tions. When the user requests such a section for viewing, we create an HTML docu- 
ment by extracting the HTML representation of the selected node and all its contents. 
This HTML segment is wrapped by additional HTML code to obtain a representation 
of the full path from the root of the DOM down to the node. In this manner we provide 
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a minimal, yet structurally consistent HTML document that can be displayed by the 
Web browser in the standard manner. 



Remote Server Implementation of the SmartView Processes 



Load the HTML Page in the 
Browser at the appropriate zoom 
level to support either full docu- 
ment viewing or vertical scrolling 
only. 

Top-down approach: Determine 
the level of the embedded table 
that is likely to provide sensible 
logical units of the content. 



For each element of the docu- 
ment partition identify the HTML 
equivalent and create an HTML 
document. 



Apply heuristic for the layout on 
the target device: specify the 
relationship and relative size of 
the elements in the selected 
section of the document. 




Fig. 3. Steps involved in creating SmartView of a Web page 



The current SmartView implementation relies upon a service, hosted outside the 
device, which performs the analysis of the page layout and page partitioning, thumb- 
nail creation, and layout modification on behalf of the client. The SmartView client is 
simply an HTML page, with scripts running on the device and forwarding the process- 
ing requests to server. With new releases of the browser software for PDAs it will be 
possible to implement the SmartView feature completely on the device, if desired. It is 
likely that the thumbnail overview will be replaced by a zoomed out version of the live 
page, displayed in the scaled down browser window, in the manner similar to Web- 
Thumb ([14]). Alternatively, this service can become a part of the publishing process. 
As the author completes the page design, all the elements, the thumbnail overview, the 
page analysis, and HTML documents corresponding to individual sections, could be 
prepared and stored on the Web server for consumption by the client. Even dynami- 
cally generated contents could be handled similarly by placing more control over the 
analysis and delivery of individual sections into the hands of publishers. 
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An approach similar to SmartView, but more elaborate in the attempt to classify 
page elements into sidebars, body of the document, menus, and similar, is explored in 
[3]. Evaluation statistics show that page analysis based on simple geometric properties 
results in satisfactory page decomposition in 90% out of 50 Web sites tested in the 
study ([3]). 



3 SearchMobil - Search Support for PDAs 

SmartView provides a simple yet effective way of making Web pages with complex 
layout more accessible to mobile devices. However, its overview plus detail approach 
also provides a framework for exposing a variety of information about the document 
or its sections. In particular, it can expose the details related to their usage or process- 
ing by other services, such as search. 

It has been observed that the users often resort to a multi-stage strategy while 
searching the Web. They specify a broader query to obtain potentially useful pages 
and then weed out irrelevant ones by skimming the text and carefully inspecting those 
that seem likely to contain information they seek. On mobile devices this process is far 
more difficult because of the small screen size. Direct access to relevant parts of the 
document is thus invaluable. That is the objective of developing the SearchMobil 
application. 

SearchMobil supports the user when he or she has submitted a query to a Web 
search engine, and is browsing through the results. It facilitates: 

Annotation of overview and detailed view with search hits. While the search engine 
produces a ranking according to its estimate of which documents are most relevant to 
the query, SearchMobil provides an indication of which part of a particular document 
is most relevant to the posed query. It adds search hits annotations to the overview and 
detail presentation of the document to assist the users in judging the relevance of each 
page region. It quickly directs their attention to the most promising parts of a docu- 
ment. 

In the overview of a page, small squares are placed in each region to indicate the 
number of query term hits it contains. The region with most hits is outlined in red 
instead of green (fig. 4, (a)). In the detailed view, the query terms are highlighted (fig. 
4, (b)). These enhancements are facilitated by providing a local search capability on 
the device or through a remote service. 

Refinement and focusing of search. The set of the top ten result pages is automatically 
downloaded, and is itself presented in an overview plus detail form: a tabbed ‘book- 
let’, as shown in figure 4. The tabs along the right-hand side of the page provide an 
overview indicator that shares the screen with the current detail region, allowing direct 
access to each search result. When the user taps on one of the tabs, he or she is pre- 
sented with the annotated overview of that document. 
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Fig. 4. The SearchMobil booklet interface showing documents in the context of a set of search 
results: (a) result 4 of a global search for “elements typography”; (b) a selected segment from 
this page, with highlighted query terms; (c) a local search over the original result set, for “color 
palette” (the tabs of relevant pages have changed colour), showing the new term hits; (d) the 
same segment as in (b) with the new query terms highlighted 



The user can specify an alternative search query that will be applied to all the 
documents in the booklet through the local search facility. Tabs of those documents 
that contain the new query will change the colour (fig. 4, (c)) and annotated with hits 
accordingly, in the thumbnail overview and the detailed view (fig. 4, (d)). 
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3.1 User Study 

Users engage in a variety of information seeking tasks on the Web. Studies have indi- 
cated that “finding” a specific, well defined piece of information, and “gathering in- 
formation” as a more open ended, research oriented activity, are among the common 
tasks ([11]). While users are likely to use the desktop to gather information, when 
mobile they may use their PDAs or mobile phones for fact finding, for example. We 
wish to investigate how useful SearchMobil is in that context. We are particularly 
interested in learning whether the current indicators of the “best” region in the docu- 
ment overview help the user to locate information quickly and reliably. 

We assume that the “finding” task starts with a query that does not necessarily con- 
tain terms that describe information sought for. Indeed, the user is typically looking 
for a detail that he or she does not know. Thus, the query is only a vehicle to get the 
user closer to the portion of the text which may contain the relevant detail. The suc- 
cess rate depends on the likelihood that the query terms and the answer co-occur 
within the same document. In case of SearchMobil, where the user is inspecting indi- 
vidual sections of documents, this requirement is even stricter: query terms and the 
answer would ideally co-occur in the best scoring section since that one is most 
prominently marked by the system. Otherwise, the user may be misled by the rele- 
vance indicators and it may take them longer to ‘recover’ and find the correct region. 
Having this in mind, we designed the study that covers three situations, two of which 
we expect to reveal possible weaknesses of the current user interface design. We look 
at the tasks of: 

• Type X: where the page can be divided up into sections by SearchMobil, and where 
the answer is in the section that is outlined in red in the overview (i.e., marked as 
most relevant, according to the search terms used). We expected SearchMobil to 
perform better than the Pocket Internet Explorer (Pocket IE) browser in these tasks. 

• Type Y: where the page has only a single section. We expect that SearchMobil 
would perform slightly worse than Pocket IE, because the overview adds no addi- 
tional information to the detail view while it presents an intermediary step that 
takes time to load and review. 

• Type Z: where the search result page can be divided up into sections by SearchMo- 
bil, and where the correct answer is not in the section that is outlined in red in the 
overview. This situation may arise when the user enters search terms that are more 
general than the actual information requirement. For example, the goal of the task 
Z2 in Table 1 is to find the postal address of the charity “Shelter”, but the search 
term is simply “shelter”. Because the participants are being directed to a section 
that does not contain the correct answer, SearchMobil will probably perform 
slightly worse than the standard browser, in terms of time spent searching for an 
answer. 

We selected 12 questions (Table 1) from TREC-9 and TREC-10 Web query collec- 
tions from query logs of on-line search engines ([13]). For each question we specified 
the search terms ourselves and submitted them to Google (http://www.google.com). 
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From each set of search results we retained one document that contains the answer to 
the question and use it in the experiments. The distribution of answers was such that 
six were selected for category X, three for Y and three for Z. 24 subjects, 16 male and 
8 female, ages 20 to 42, were searching for answers to the questions among selected 
documents, either using SearchMobil or the Pocket IE browser, on the Compaq iPAQ 
3760. 



PocketPC SearchMobil 



X 



Y 



Z 




Fig. 5. Examples of the three types of task, showing the portion of the answer page that was 
visible without scrolling for both types of browser: PocketlE in the left column and SearchMo- 
bil page view on the right. In the SearchMobil overviews, the recommended region is outlined 
in red, with two yellow squares in the top right comer 
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We measured the time it took to complete each task. Based on Analysis of Variance 
(ANOVA) we found that there was no major effect of the browser type on a task 
(F( 1,238) = 1.26, p= 0.23); overall, there was very little difference between the two 
browsers, with the mean time 17.87 seconds for Pocket IE and 18.13 seconds for 
SearchMobil. However, as we expected, there was a significant difference in perform- 
ance between tasks (F(l 1,238) = 25.0, p<0.001), and a significant interaction between 
the browser and the task (F(l 1,238) = 3.83, p<0.00l ). These differences are enumer- 
ated in Table 2. 



Table 1. The questions (and their associated search terms) used in the experiment 



Questions and search terms 


ID 


How many hexagons and pentagons are there on a football? 
hexagons pentagons football 


XI 


How tall is the Sears Tower, in feet? 
sears tower height 


X2 


In which year did Hawaii become a state of the USA? 
hawaii became state 


X3 


Who is credited with inventing the paper clip? 
paper clip invented 


X4 


What is the salary of a UK member of parliament? 
uk mp salary 


X5 


How many miles is it from London to Plymouth? 
miles london Plymouth 


X6 


How much was a third-class ticket for the ship “Ttanic”? 
titanic ticket cost 


Y1 


Which polymer is used to make bulletproof vests? 
polymer bulletproof 


Y2 


How long is the average elephant pregnancy? 
elephant pregnancy 


Y3 


What is the telephone number of the University of Sussex? 
university Sussex 


Z1 


What is the postal address of the charity “Shelter”? 
shelter 


Z2 


Which metal has the highest melting point? 
metal highest melting point 


Z3 



3.2 Conclusions of the Study 

Type X. As we expected, SearchMobil generally outperformed Pocket IE in these 
tasks as the answer to the question was in the section marked as most relevant. Task 
X3 was the only exception since only a small part of the most relevant section was 
visible (at the bottom of the screen) without vertical scrolling. Many of the partici- 
pants did not see the outlined red section and clicked on the top section instead. 
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Type Y. We expected that SearchMobil would perform slightly worse than Pocket IE 
for these tasks, because the SearchMobil detailed view of the page is simply a single 
long document. This was generally true, although in the case of task Y2, where the 
answer was quite far down the page, it seems that SearchMobil’ s term highlighting 
helped participants to find the answer more quickly. In pages of this type, the user 
should probably be taken straight to a suitable detail view, instead of navigating via an 
intermediate overview. In the future, as an alternative visualization of page structure, 
such documents could be segmented at the paragraph level. 

Type Z. Again, we expected that SearchMobil would perform worse than Pocket IE 
for these tasks. Task Z1 was relatively easy in both browsers - the document was short 
and, although in the SearchMobil case the answer was not in the red-outlined section, 
there were only three other sections to check, all of which contained very little text. 

Table 2. Mean completion times (in seconds) for each combination of task and browser. Num- 
bers in bold indicate the tasks for which SearchMobil outperforms Pocket IE 
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Fig. 6. Histogram derived from the averages of log(time) statistics, calculated over all the tasks 
of a given type (X, Y, or Z). For the obtained average we apply the inverse of the log to arrive 
at the presented time statistics 



A written questionnaire completed by the participants provided us with additional 
insight in the clues that the users rely upon while searching for information ([10]). For 
example, participant 18 described in note form his search strategy using SearchMobil 
as follows: “Addresses, and the like, usually located at edge of page, so look there first 
for those. Other content, check main body of page, top to bottom; if information not 
found, try another part of the page. Very easy to move around the document and focus 
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in on particular sections. Having an overview, with a little practice, allows you to 
guess where the content you want may be fairly well - especially for addresses, etc.” 

3.3 Analysis of the Task Distribution 

In order to assess the practical value of the SearchMobil approach, it is important to 
investigate how often the user faces each of the task types X, Y, and Z during on-line 
search. For that purpose, we performed an automated analysis of search results for a 
sample of queries from the same TREC query collection ([13]). 

We selected 116 focused, fact finding queries whose “correct answer” can be 
matched by a regular expression. Just like in the user study, for each query we manu- 
ally selected query terms, submitted them to Google, and collected the top 10 search 
results. We processed the retrieved documents with SearchMobil to obtain information 
on the query term hits in individual sections of the page and determine the section with 
the highest relevance score. At the same time we verified whether the answer to the 
question (as expressed by the regular expression) happens to be in the best scoring 
section. Incidentally, only 57.5% of the total 1,160 search results contain the right 
answer (Figure 7, (a)). 
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Fig. 7. Distribution of X, Y, and Z types of tasks in: (a) the collection of all search results and 
(b) the collection of results which contain the correct answers 

We found that 65.1% of pages that do contain the correct answer are of type X; 
thus, the region highlighted by SearchMobil as the most relevant to the query also 
contains the correct answer. About 21.9% of the pages are of type Y, having no sub- 
partition of the page into logical units. Only 13% fall into the category Z where the 
correct answer lies outside the best scoring region for the query terms. For this, rela- 
tively small number of pages, the indication of the best region may have an adverse 
affect on the speed of locating the answer, as observed in the user study. 

The fact that about 21.9% of the pages with the correct answer (equivalent to 
12.6% of all pages) fall into Y category, provides an opportunity for further refine- 
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ment of the page analysis to give a finer level feedback, perhaps on the paragraph 
level, about the relevance of the page content. 



4 Conclusions 

In order to make optimum use of the small displays on mobile devices for Web 
searching, it is necessary to separate overview and detail concerns of the search task 
into different visual renderings. We have discussed three designs that achieve this in 
different ways. SmartView provides page analyses and a compressed overview visu- 
alization to facilitate navigation to structurally significant regions of a page. Search- 
Mobil annotates that overview to show the location of search terms of interest and the 
most relevant region for the particular search context. The SearchMobil booklet view 
presents the overview of a set of retrieved pages, using visual tab properties to indicate 
the degree of their relevance. As with all overview plus detail visualizations, these 
solutions suit some tasks and information structures better than others. Our evaluation 
has confirmed this dependency, and highlights the kind of tasks that SmartView and 
SearchMobil can facilitate. 
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Abstract. In this paper, we propose a new summarization method appropriate 
for sending text to mobile phones. In mobile access research, an important issue 
is how to display compact and informative summaries on a screen much smaller 
than that of an ordinary computer. Documents with varieties of genres presenting 
information such as opinions, evaluations, etc. have been published on the Web. 
Most previous summarization research, however, has focused on factual informa- 
tion and topics in documents. For a document that asserts the author’s opinion, 
we assumed that combining factual information and subjective information such 
as opinions would be effective to produce short but informative summaries ade- 
quate to comprehend the contents of the original documents. We propose a sum- 
marization method that exploits the typical text structure of the genre. We test the 
effectiveness of the proposed methods by asking three users who use the genre 
of “columns” in ordinary life to evaluate summaries in aspect of the recognition 
test of important sentences and to demonstrate their comprehension of original 
documents. With the comprehension test, our method which was based on the 
usage of sentence types was evaluated to be more informative than the existing 
methods. 



1 Introduction 

In the field of natural language processing, automatic summarization research has 
played an increasingly important role [ 1 ] . In mobile access research, an important issue 
is how to display compact and informative summaries on a screen much smaller than 
that of an ordinary computer. In this paper, we propose a new summarization method 
appropriate for mobile phones. Documents with varieties of genres presenting infor- 
mation such as opinions, evaluation, etc. have been published on the Web. Automatic 
summarization can be defined as employing technological methods to present the im- 
portant content of texts in a condensed way, such that it will still meet the user’s infor- 
mation needs. Summarization research, however, has mainly focused on factual infor- 
mation and topics in documents heretofore. We use the genre property of the original 
document to produce an informative summary. Where authors have stressed their asser- 
tions, it is also effective to extract subjective information such as opinions, evaluations, 
prospectives, speculations, attitudes, and emotions. 

To produce summaries of the important contents of original documents, it is effec- 
tive to use the genre of the original documents and the typical structure of the text. 

F. Crestani et al. (Eds.): Mobile and Ubiquitous Info. Access Ws 2003. LNCS 2954, pp. 172-186, 2004. 
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For documents that assert authors’ opinions, we assumed that as well as factual infor- 
mation, summaries that contain the combination of factual information and subjective 
information such as opinions, evaluations, and prospectives would be effective in com- 
prehending the original documents. 

In this research, we chose columns in newspaper articles that report the economic 
situation as examples of documents that assert subjective information such as opinions: 
i.e., Japanese Nikkei Business Daily column articles "Business Today”, and Japanese 
Nikkei Financial Daily column articles “Position”. We assumed the situation of time- 
pressed businessmen obtaining compact current economic information through mobile 
phones. 

To implement our summarization method, we surveyed the text structure of columns 
as recognized by four assessors. Our approach was similar to the approaches proposed 
for scientific articles [2] and legal texts [3], but we focused explicitly on the subjective 
information. 

In addition, summarization is a process of changing the lengths of input sentences, 
and so information presentation technology for restricting size is essential [4], Several 
approaches for summarization for mobile phone or small screens [5-7] have been pro- 
posed. Corston-Oliver [6] proposed several heuristic compaction rules for character- 
sensitive reduction. The problem of sentence-weighting approaches for small screens 
has not been discussed sufficiently. 

Sweeney et al. [5] showed that headlines and three versions of summaries for differ- 
ent compression rates were effective for information retrieval. The summaries for this 
usage were called “indicative” summaries. Mani [1, p. 8] explained this term as fol- 
lows: an indicative abstract provides a reference function for selecting documents for 
more in-depth reading. In contrast, “informative” summaries for small screens were not 
discussed sufficiently. Mani [1, p. 8] also explained this term: an informative abstract 
covers all the salient information in the source at some level of detail. Our focus is dif- 
ferent from others in that we have focused on the informativeness of the summarization. 

For evaluation, we experimented using tests of recognition of important sentences 
and comprehension of original documents by three business persons with sufficient 
expertise in the economic field. To display summaries on the very small screens of 
mobile phones, the original documents must be edited to be brief. Our goal was to 
produce informative summaries and we evaluated summaries from the viewpoint of 
“information per character”. 

This paper consists of six sections. In Section 2, we propose our summarization 
method based on text structures for columns. In Section 3, we show our results. In 
Section 4, we explain the evaluation using “important sentence recognition” tests and 
“questions for user comprehension of original documents” tests. In Section 5, we intro- 
duce our implementation briefly. Our conclusions are presented in Section 6. 



2 Methodology 



In this section, we explain the methodology of our proposal. We describe our motivation 
and overview of our new method based on text structure to meet user’s factual and 
subjective information needs. 
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2.1 Our Proposal: Summarization Based on Text Structure 

We propose a summarization method based on text structure appropriate for mobile 
phones to produce compact, balanced, and informative summaries. We chose news- 
paper column articles about economics as examples. These texts contained both fac- 
tual content and subjective information such as opinions, evaluation, etc. In the case 
that a document asserts an opinion, we assume that combining factual information and 
subjective information should be an effective way to produce short but representative 
summaries with sufficient information to comprehend the contents of the orginal docu- 
ments. 

This method exploits the typical text structure of the genre for factual/subjective 
information needs. In Section 2.2, we introduce existing summarization methods for 
factual information. Some of these were employed as weighting functions in our sum- 
marization strategy. In Section 2.3, the text structure of columns is detailed. Our sum- 
marization strategy was shown in Section 2.4 to balance factual and subjective infor- 
mation. 



2.2 Existing Summarization Methods for Factual Information 

In this section, we explain summarization methods that we have already proposed to 
extract sentences including factual information [8], 

The summarization process was carried out in two stages: important sentence ex- 
traction and then the transformation processes. Important sentence extraction is based 
on five weighting approaches, discounted by sentence length. In this experiment, we 
used three conventional weighting approaches [9], such as “sentence position weight- 
ing”, “words weighting in headlines”, and “words weighting based on TF*IDF”, and 
two combination of these. We used the lead method, which is effective for newspaper 
summarization, as a baseline and compared it with the results of important sentence 
extraction based on the five weighting approaches [8]. 

Therefore, the summarization methods that we used were as follows: 

(a) The lead method (baseline) 

(b) Position weighting 

(c) Fleadline weighting 

(d) TF*IDF weighting 

(e) Position weighting multiplied by headline weighting (i.e., (b) x (c)) 

(f) Position weighting multiplied by the sum of headline weighting and TF*IDF 
weighting (i.e., (b) x ((c) + (d))). 

Three sentences were extracted based on these strategies. We collected 10 column arti- 
cles randomly and extracted three sentences from each based on these methods. 

We detail the lead method and the five basic weighting approaches. 

(a) The lead method 

The lead method is a popular and effective method for summarization of newspaper 
articles. In this strategy, the first three sentences are extracted. 




